LinearRegressor

LinearRegressor

A model type for constructing a linear regressor, based on GLM.jl, and implementing the MLJ model interface.

From MLJ, the type can be imported using

LinearRegressor = @load LinearRegressor pkg=GLM

Do model = LinearRegressor() to construct an instance with default hyper-parameters. Provide keyword arguments to override hyper-parameter defaults, as in LinearRegressor(fit_intercept=...).

LinearRegressor assumes the target is a continuous variable whose conditional distribution is normal with constant variance, and whose expected value is a linear combination of the features (identity link function). Options exist to specify an intercept or offset feature.

Training data

In MLJ or MLJBase, bind an instance model to data with one of:

mach = machine(model, X, y)
mach = machine(model, X, y, w)

Here

  • X: is any table of input features (eg, a DataFrame) whose columns are of scitype Continuous; check the scitype with schema(X)
  • y: is the target, which can be any AbstractVector whose element scitype is Continuous; check the scitype with scitype(y)
  • w: is a vector of Real per-observation weights

Hyper-parameters

  • fit_intercept=true: Whether to calculate the intercept for this model. If set to false, no intercept will be calculated (e.g. the data is expected to be centered)
  • dropcollinear=false: Whether to drop features in the training data to ensure linear independence. If true , only the first of each set of linearly-dependent features is used. The coefficient for redundant linearly dependent features is 0.0 and all associated statistics are set to NaN.
  • offsetcol=nothing: Name of the column to be used as an offset, if any. An offset is a variable which is known to have a coefficient of 1.
  • report_keys: Vector of keys for the report. Possible keys are: :deviance, :dof_residual, :stderror, :vcov, :coef_table and :glm_model. By default only :glm_model is excluded.

Train the machine using fit!(mach, rows=...).

Operations

  • predict(mach, Xnew): return predictions of the target given new features Xnew having the same Scitype as X above. Predictions are probabilistic.
  • predict_mean(mach, Xnew): instead return the mean of each prediction above
  • predict_median(mach, Xnew): instead return the median of each prediction above.

Fitted parameters

The fields of fitted_params(mach) are:

  • features: The names of the features encountered during model fitting.
  • coef: The linear coefficients determined by the model.
  • intercept: The intercept determined by the model.

Report

When all keys are enabled in report_keys, the following fields are available in report(mach):

  • deviance: Measure of deviance of fitted model with respect to a perfectly fitted model. For a linear model, this is the weighted residual sum of squares
  • dof_residual: The degrees of freedom for residuals, when meaningful.
  • stderror: The standard errors of the coefficients.
  • vcov: The estimated variance-covariance matrix of the coefficient estimates.
  • coef_table: Table which displays coefficients and summarizes their significance and confidence intervals.
  • glm_model: The raw fitted model returned by GLM.lm. Note this points to training data. Refer to the GLM.jl documentation for usage.

Examples

using MLJ
LinearRegressor = @load LinearRegressor pkg=GLM
glm = LinearRegressor()

X, y = make_regression(100, 2) ## synthetic data
mach = machine(glm, X, y) |> fit!

Xnew, _ = make_regression(3, 2)
yhat = predict(mach, Xnew) ## new predictions
yhat_point = predict_mean(mach, Xnew) ## new predictions

fitted_params(mach).features
fitted_params(mach).coef ## x1, x2, intercept
fitted_params(mach).intercept

report(mach)

See also LinearCountRegressor, LinearBinaryClassifier