Reference
Types
- Finite
- Multiclass
- OrderedFactor
- Infinite
- Continuous
- Count
- Image
- ColorImage
- GrayImage
- Sampleable
- Density
- Table
- Textual
ScientificTypesBase.Finite — Type
Finite{N}Scientific type for scalar, categorical data taking on one of N possible discrete values, which may or may not have a natural ordering.
Subtypes: Multiclass{N}, OrderedFactor{N}
Aliases: Binary==Finite{2}. Binary data can be unordered (Multiclass{2}) or ordered (OrderedFactor{2}).
See also scitype.
ScientificTypesBase.Multiclass — Type
Multiclass{N}Scientific type for scalar, categorical data with N possible values but no natural ordering for those classes (nominal data).
Examples: gender, team member, model number, product color, ethnicity, zipcode
Supertype: Finite{N}
See also scitype.
ScientificTypesBase.OrderedFactor — Type
OrderedFactor{N}Scientific type for scalar, categorical data with N possible values with a natural ordering (ordinal data).
Includes the binary data scientific type OrderedFactor{2}, applying whenever it is natural to assign a "positive" class, for example, by a standard convention (e.g, "is toxic", "is an anomaly", "has the disease"). The "positive" class is the maximal class under the ordering. The distinction is important to disambiguate statistical metrics such as "number of true positives", "recall", etc.
Examples: letter grade in an exam, education level, number of stars in a review, safe/toxic, inlier/outlier, rejected/accepted.
Supertype: Finite{N}
See also scitype.
ScientificTypesBase.Infinite — Type
Infinite{N}Scientific type for scalar data with an intrinsic order, but of unbounded nature, either discrete or continuous.
Subtypes: Continuous, Count
See also scitype.
ScientificTypesBase.Continuous — Type
ContinuousScientific type for continuous scalar data.
Examples: height, age, blood-pressure, weight, temperature.
Supertype: Infinite
See also scitype.
ScientificTypesBase.Count — Type
CountScientific type for discrete, ordered data, of unbounded nature.
Examples: number of phone calls per hour, number of building occupants, number of earthquakes per year over 6 on the Richter scale, number of unsaturated carbon-carbon bonds in a molecule.
Supertype: Infinite
See also scitype.
ScientificTypesBase.Image — Type
Image{W,H}Scientific type for image data, where W is the width and H the height.
Subtypes: GrayImage{W,H}, ColorImage{W,H}
See also scitype.
ScientificTypesBase.ColorImage — Type
ColorImage{W,H}Scientific type for a color image, where W is the width and H the height.
Supertype: Image{W,H}
See also scitype.
ScientificTypesBase.GrayImage — Type
GrayImage{W,H}Scientific type for a grey-scale image, where W is the width and H the height.
Supertype: Image{W,H}
See also scitype.
ScientificTypesBase.Sampleable — Type
Sampleable{Ω}Scientific type for an object, such a probability distribution, that can be sampled. Each individual sample x will satisfy scitype(x) isa Ω.
Subtype: Density{Ω}
See also scitype.
ScientificTypesBase.Density — Type
Density{Ω}Scientific type for an object representing a probability density function or probability mass function, and more generally, for any probability measure that is absolutely continuous with respect to some standard measure on the sample space. Elements x of the sample space will satisfy scitype(x) isa Ω. Objects of this type can, at least in principle, be sampled.
Supertype: Sampleable{Ω}
See also scitype.
ScientificTypesBase.Table — Type
Table{K}Scientific type for tabular data. Here K will be a union of the scitypes of the columns (not the union of the element scitype of the columns).
See also scitype.
ScientificTypesBase.Textual — Type
TextualScientific type for text data playing some linguistic role, for example in sentiment analysis. This is to be contrasted with text used simply to label classes of a categorical variable; see instead Finite.
Examples: survey questions with discursive answers, text to be translated into a new language, vocabularies, email messages.
See also scitype.
Methods
ScientificTypes.scitype — Function
scitype(X)Return the scientific type (interpretation) of X, as distinct from its machine type. Atomic scientific types (Continuous, Multiclass, etc) are mostly abstract types defined in the package ScientificTypesBase.jl. Scientific types do not ordinarily have instances.
Third party packages may extend the behavior of scitype: Objects previously having Unknown scitype may no longer do so.
To display the active scientific type hierarchy (excluding Missing and Nothing) do scitype().
Examples
julia> scitype(3.14)
Continuous
julia> scitype([1, 2, missing])
AbstractVector{Union{Missing, Count}}
julia> scitype((5, "beige"))
Tuple{Count, Textual}
julia> using CategoricalArrays
julia> table = (gender = categorical(['M', 'M', 'F', 'M', 'F']),
ndevices = [1, 3, 2, 3, 2])
julia> scitype(table)
Table{Union{AbstractVector{Count}, AbstractVector{Multiclass{2}}}}
Column scitypes of a table can also be inspected with schema.
The behavior of scitype is detailed in the ScientificTypes documentation. Key features of the default behavior are:
AbstractFloathas scitype asContinuous <: Infinite.Any
Integerhas scitype asCount <: Infinite.Any
CategoricalValuexhas scitype asMulticlass <: FiniteorOrderedFactor <: Finite, depending on the value ofisordered(x).Strings andChars do not have scitypeMulticlassorOrderedFactor; they have scitypesTextualandUnknownrespectively.The scientific types of
nothingandmissingareNothingandMissing, Julia types that are also regarded as scientific.
scitype(; io=stdout)Print to io the scitype hierarchy, beginning at Found (and so excluding Missing and Nothing).
Note that third party packages can extend the hierarchy, so output is not static.
ScientificTypes.schema — Function
schema(X)Inspect the column types and scitypes of a tabular object. returns nothing if the column types and/or scitypes can't be inspected.
Example
X = (ncalls=[1, 2, 4], mean_delay=[2.0, 5.7, 6.0])
schema(X)ScientificTypes.coerce — Function
coerce(A, S)Return new version of the array A whose scientific element type is S.
julia> v = coerce([3, 7, 5], Continuous)
3-element Vector{Float64}:
3.0
7.0
5.0
julia> scitype(v)
AbstractVector{Continuous}
coerce(X, specs...; tight=false, verbosity=1)Given a table X, return a copy of X, ensuring that the element scitypes of the columns match the new specification, specs. There are three valid specifications:
(i) one or more column_name=>Scitype pairs:
coerce(X, col1=>Scitype1, col2=>Scitype2, ... ; verbosity=1)(ii) one or more OldScitype=>NewScitype pairs (OldScitype covering both the OldScitype and Union{Missing,OldScitype} cases):
coerce(X, OldScitype1=>NewScitype1, OldScitype2=>NewScitype2, ... ; verbosity=1)(iii) a dictionary of scientific types keyed on column names:
coerce(X, d::AbstractDict{<:ColKey, <:Type}; verbosity=1)where ColKey = Union{Symbol,AbstractString}.
Examples
Specifying column_name=>Scitype pairs:
using CategoricalArrays, DataFrames, Tables
X = DataFrame(name=["Siri", "Robo", "Alexa", "Cortana"],
height=[152, missing, 148, 163],
rating=[1, 5, 2, 1])
Xc = coerce(X, :name=>Multiclass, :height=>Continuous, :rating=>OrderedFactor)
schema(Xc).scitypes # (Multiclass, Continuous, OrderedFactor)Specifying OldScitype=>NewScitype pairs:
X = (x = [1, 2, 3],
y = rand(3),
z = [10, 20, 30])
Xc = coerce(X, Count=>Continuous)
schema(Xfixed).scitypes # (Continuous, Continuous, Continuous)coerce(image::AbstractArray{<:Real, N}, I)Given an array called image representing one or more images, return a transformed version of the data so as to enforce an appropriate scientific interpretation I:
| single or collection ? | N | I | scitype of result |
|---|---|---|---|
| single | 2 | GrayImage | GrayImage{W,H} |
| single | 3 | ColorImage | ColorImage{W,H} |
| collection | 3 | GrayImage | AbstractVector{<:GrayImage} |
| collection | 4 (W x H x {1} x C) | GrayImage | AbstractVector{<:GrayImage} |
| collection | 4 | ColorImage | AbstractVector{<:ColorImage} |
imgs = rand(10, 10, 3, 5)
v = coerce(imgs, ColorImage)
julia> typeof(v)
Vector{Matrix{ColorTypes.RGB{Float64}}}
julia> scitype(v)
AbstractVector{ColorImage{10, 10}}
ScientificTypes.autotype — Function
autotype(X; kw...)Return a dictionary of suggested scitypes for each column of X, a table or an array based on rules
Kwargs
only_changes=true: if true, return only a dictionary of the names for which applying autotype differs from just using the ambient convention. When coercing with autotype,only_changesshould be true.rules=(:few_to_finite,): the set of rules to apply.