OneRuleClassifier

OneRuleClassifier

A model type for constructing a one rule classifier, based on OneRule.jl, and implementing the MLJ model interface.

From MLJ, the type can be imported using

OneRuleClassifier = @load OneRuleClassifier pkg=OneRule

Do model = OneRuleClassifier() to construct an instance with default hyper-parameters.

OneRuleClassifier implements the OneRule method for classification by Robert Holte ("Very simple classification rules perform well on most commonly used datasets" in: Machine Learning 11.1 (1993), pp. 63-90).

For more information see:

- Witten, Ian H., Eibe Frank, and Mark A. Hall. 
  Data Mining Practical Machine Learning Tools and Techniques Third Edition. 
  Morgan Kaufmann, 2017, pp. 93-96.
- [Machine Learning - (One|Simple) Rule](https://datacadamia.com/data_mining/one_rule)
- [OneRClassifier - One Rule for Classification](http://rasbt.github.io/mlxtend/user_guide/classifier/OneRClassifier/)

Training data

In MLJ or MLJBase, bind an instance model to data with mach = machine(model, X, y) where

  • X: any table of input features (eg, a DataFrame) whose columns each have one of the following element scitypes: Multiclass, OrderedFactor, or <:Finite; check column scitypes with schema(X)
  • y: is the target, which can be any AbstractVector whose element scitype is OrderedFactor or Multiclass; check the scitype with scitype(y)

Train the machine using fit!(mach, rows=...).

Hyper-parameters

This classifier has no hyper-parameters.

Operations

  • predict(mach, Xnew): return (deterministic) predictions of the target given features Xnew having the same scitype as X above.

Fitted parameters

The fields of fitted_params(mach) are:

  • tree: the tree (a OneTree) returned by the core OneTree.jl algorithm
  • all_classes: all classes (i.e. levels) of the target (used also internally to transfer levels-information to predict)

Report

The fields of report(mach) are:

  • tree: The OneTree created based on the training data
  • nrules: The number of rules tree contains
  • error_rate: fraction of wrongly classified instances
  • error_count: number of wrongly classified instances
  • classes_seen: list of target classes actually observed in training
  • features: the names of the features encountered in training

Examples

using MLJ

ORClassifier = @load OneRuleClassifier pkg=OneRule

orc = ORClassifier()

outlook = ["sunny", "sunny", "overcast", "rainy", "rainy", "rainy", "overcast", "sunny", "sunny", "rainy",  "sunny", "overcast", "overcast", "rainy"]
temperature = ["hot", "hot", "hot", "mild", "cool", "cool", "cool", "mild", "cool", "mild", "mild", "mild", "hot", "mild"]
humidity = ["high", "high", "high", "high", "normal", "normal", "normal", "high", "normal", "normal", "normal", "high", "normal", "high"]
windy = ["false", "true", "false", "false", "false", "true", "true", "false", "false", "false", "true", "true", "false", "true"]

weather_data = (outlook = outlook, temperature = temperature, humidity = humidity, windy = windy)
play_data = ["no", "no", "yes", "yes", "yes", "no", "yes", "no", "yes", "yes", "yes", "yes", "yes", "no"]

weather = coerce(weather_data, Textual => Multiclass)
play = coerce(play, Multiclass)

mach = machine(orc, weather, play)
fit!(mach)

yhat = MLJ.predict(mach, weather)       ## in a real context 'new' `weather` data would be used
one_tree = fitted_params(mach).tree
report(mach).error_rate

See also OneRule.jl.