LinearRegressor
LinearRegressor
A model type for constructing a linear regressor, based on GLM.jl, and implementing the MLJ model interface.
From MLJ, the type can be imported using
LinearRegressor = @load LinearRegressor pkg=GLM
Do model = LinearRegressor()
to construct an instance with default hyper-parameters. Provide keyword arguments to override hyper-parameter defaults, as in LinearRegressor(fit_intercept=...)
.
LinearRegressor
assumes the target is a continuous variable whose conditional distribution is normal with constant variance, and whose expected value is a linear combination of the features (identity link function). Options exist to specify an intercept or offset feature.
Training data
In MLJ or MLJBase, bind an instance model
to data with one of:
mach = machine(model, X, y)
mach = machine(model, X, y, w)
Here
X
: is any table of input features (eg, aDataFrame
) whose columns are of scitypeContinuous
; check the scitype withschema(X)
y
: is the target, which can be anyAbstractVector
whose element scitype isContinuous
; check the scitype withscitype(y)
w
: is a vector ofReal
per-observation weights
Hyper-parameters
fit_intercept=true
: Whether to calculate the intercept for this model. If set to false, no intercept will be calculated (e.g. the data is expected to be centered)dropcollinear=false
: Whether to drop features in the training data to ensure linear independence. If true , only the first of each set of linearly-dependent features is used. The coefficient for redundant linearly dependent features is0.0
and all associated statistics are set toNaN
.offsetcol=nothing
: Name of the column to be used as an offset, if any. An offset is a variable which is known to have a coefficient of 1.report_keys
:Vector
of keys for the report. Possible keys are::deviance
,:dof_residual
,:stderror
,:vcov
,:coef_table
and:glm_model
. By default only:glm_model
is excluded.
Train the machine using fit!(mach, rows=...)
.
Operations
predict(mach, Xnew)
: return predictions of the target given new featuresXnew
having the same Scitype asX
above. Predictions are probabilistic.predict_mean(mach, Xnew)
: instead return the mean of each prediction abovepredict_median(mach, Xnew)
: instead return the median of each prediction above.
Fitted parameters
The fields of fitted_params(mach)
are:
features
: The names of the features encountered during model fitting.coef
: The linear coefficients determined by the model.intercept
: The intercept determined by the model.
Report
When all keys are enabled in report_keys
, the following fields are available in report(mach)
:
deviance
: Measure of deviance of fitted model with respect to a perfectly fitted model. For a linear model, this is the weighted residual sum of squaresdof_residual
: The degrees of freedom for residuals, when meaningful.stderror
: The standard errors of the coefficients.vcov
: The estimated variance-covariance matrix of the coefficient estimates.coef_table
: Table which displays coefficients and summarizes their significance and confidence intervals.glm_model
: The raw fitted model returned byGLM.lm
. Note this points to training data. Refer to the GLM.jl documentation for usage.
Examples
using MLJ
LinearRegressor = @load LinearRegressor pkg=GLM
glm = LinearRegressor()
X, y = make_regression(100, 2) ## synthetic data
mach = machine(glm, X, y) |> fit!
Xnew, _ = make_regression(3, 2)
yhat = predict(mach, Xnew) ## new predictions
yhat_point = predict_mean(mach, Xnew) ## new predictions
fitted_params(mach).features
fitted_params(mach).coef ## x1, x2, intercept
fitted_params(mach).intercept
report(mach)
See also LinearCountRegressor
, LinearBinaryClassifier