- 2.25.0 (latest)
- 2.24.0
- 2.23.0
- 2.22.0
- 2.21.0
- 2.20.0
- 2.19.0
- 2.18.0
- 2.17.0
- 2.16.0
- 2.15.0
- 2.14.0
- 2.13.0
- 2.12.0
- 2.11.0
- 2.10.0
- 2.9.0
- 2.8.0
- 2.7.0
- 2.6.0
- 2.5.0
- 2.4.0
- 2.3.0
- 2.2.0
- 1.36.0
- 1.35.0
- 1.34.0
- 1.33.0
- 1.32.0
- 1.31.0
- 1.30.0
- 1.29.0
- 1.28.0
- 1.27.0
- 1.26.0
- 1.25.0
- 1.24.0
- 1.22.0
- 1.21.0
- 1.20.0
- 1.19.0
- 1.18.0
- 1.17.0
- 1.16.0
- 1.15.0
- 1.14.0
- 1.13.0
- 1.12.0
- 1.11.1
- 1.10.0
- 1.9.0
- 1.8.0
- 1.7.0
- 1.6.0
- 1.5.0
- 1.4.0
- 1.3.0
- 1.2.0
- 1.1.0
- 1.0.0
- 0.26.0
- 0.25.0
- 0.24.0
- 0.23.0
- 0.22.0
- 0.21.0
- 0.20.1
- 0.19.2
- 0.18.0
- 0.17.0
- 0.16.0
- 0.15.0
- 0.14.1
- 0.13.0
- 0.12.0
- 0.11.0
- 0.10.0
- 0.9.0
- 0.8.0
- 0.7.0
- 0.6.0
- 0.5.0
- 0.4.0
- 0.3.0
- 0.2.0
Linear models. This module is styled after scikit-learn's linear_model module: https://scikit-learn.org/stable/modules/linear_model.html.
Classes
LinearRegression
LinearRegression(
    *,
    optimize_strategy: typing.Literal[
        "auto_strategy", "batch_gradient_descent", "normal_equation"
    ] = "auto_strategy",
    fit_intercept: bool = True,
    l1_reg: typing.Optional[float] = None,
    l2_reg: float = 0.0,
    max_iterations: int = 20,
    warm_start: bool = False,
    learning_rate: typing.Optional[float] = None,
    learning_rate_strategy: typing.Literal["line_search", "constant"] = "line_search",
    tol: float = 0.01,
    ls_init_learning_rate: typing.Optional[float] = None,
    calculate_p_values: bool = False,
    enable_global_explain: bool = False
)Ordinary least squares Linear Regression.
LinearRegression fits a linear model with coefficients w = (w1, ..., wp) to minimize the residual sum of squares between the observed targets in the dataset, and the targets predicted by the linear approximation.
Examples:
>>> from bigframes.ml.linear_model import LinearRegression
>>> import bigframes.pandas as bpd
>>> bpd.options.display.progress_bar = None
>>> X = bpd.DataFrame({                 "feature0": [20, 21, 19, 18],                 "feature1": [0, 1, 1, 0],                 "feature2": [0.2, 0.3, 0.4, 0.5]})
>>> y = bpd.DataFrame({"outcome": [0, 0, 1, 1]})
>>> # Create the linear model
>>> model = LinearRegression()
>>> model.fit(X, y)
LinearRegression()
>>> # Score the model
>>> score = model.score(X, y)
>>> print(score) # doctest:+SKIP
    mean_absolute_error  mean_squared_error  mean_squared_log_error          0             0.022812            0.000602                 0.00035
    median_absolute_error  r2_score  explained_variance
0               0.015077  0.997591            0.997591
| Parameters | |
|---|---|
| Name | Description | 
| optimize_strategy | str, default "auto_strategy"The strategy to train linear regression models. Possible values are "auto_strategy", "batch_gradient_descent", "normal_equation". Default to "auto_strategy". | 
| fit_intercept | bool, default TrueDefault  | 
| l1_reg | float or None, default NoneThe amount of L1 regularization applied. Default to None. Can't be set in "normal_equation" mode. If unset, value 0 is used. | 
| l2_reg | float, default 0.0The amount of L2 regularization applied. Default to 0. | 
| max_iterations | int, default 20The maximum number of training iterations or steps. Default to 20. | 
| warm_start | bool, default FalseDetermines whether to train a model with new training data, new model options, or both. Unless you explicitly override them, the initial options used to train the model are used for the warm start run. Default to False. | 
| learning_rate | float or None, default NoneThe learn rate for gradient descent when learning_rate_strategy='constant'. If unset, value 0.1 is used. If learning_rate_strategy='line_search', an error is returned. | 
| learning_rate_strategy | str, default "line_search"The strategy for specifying the learning rate during training. Default to "line_search". | 
| tol | float, default 0.01The minimum relative loss improvement that is necessary to continue training when EARLY_STOP is set to true. For example, a value of 0.01 specifies that each iteration must reduce the loss by 1% for training to continue. Default to 0.01. | 
| ls_init_learning_rate | float or None, default NoneSets the initial learning rate that learning_rate_strategy='line_search' uses. This option can only be used if line_search is specified. If unset, value 0.1 is used. | 
| calculate_p_values | bool, default FalseSpecifies whether to compute p-values and standard errors during training. Default to False. | 
| enable_global_explain | bool, default FalseWhether to compute global explanations using explainable AI to evaluate global feature importance to the model. Default to False. | 
LogisticRegression
LogisticRegression(
    *,
    optimize_strategy: typing.Literal[
        "auto_strategy", "batch_gradient_descent"
    ] = "auto_strategy",
    fit_intercept: bool = True,
    l1_reg: typing.Optional[float] = None,
    l2_reg: float = 0.0,
    max_iterations: int = 20,
    warm_start: bool = False,
    learning_rate: typing.Optional[float] = None,
    learning_rate_strategy: typing.Literal["line_search", "constant"] = "line_search",
    tol: float = 0.01,
    ls_init_learning_rate: typing.Optional[float] = None,
    calculate_p_values: bool = False,
    enable_global_explain: bool = False,
    class_weight: typing.Optional[
        typing.Union[typing.Literal["balanced"], typing.Dict[str, float]]
    ] = None
)Logistic Regression (aka logit, MaxEnt) classifier.
from bigframes.ml.linear_model import LogisticRegression import bigframes.pandas as bpd bpd.options.display.progress_bar = None X = bpd.DataFrame({ "feature0": [20, 21, 19, 18], "feature1": [0, 1, 1, 0], "feature2": [0.2, 0.3, 0.4, 0.5]}) y = bpd.DataFrame({"outcome": [0, 0, 1, 1]})
Create the LogisticRegression
model = LogisticRegression() model.fit(X, y) LogisticRegression() model.predict(X) # doctest:+SKIP predicted_outcome predicted_outcome_probs feature0 feature1 feature2 0 0 [{'label': 1, 'prob': 3.1895929877221615e-07} ... 20 0 0.2 1 0 [{'label': 1, 'prob': 5.662891265051953e-06} ... 21 1 0.3 2 1 [{'label': 1, 'prob': 0.9999917826885262} {'l... 19 1 0.4 3 1 [{'label': 1, 'prob': 0.9999999993659574} {'l... 18 0 0.5 4 rows × 5 columns
[4 rows x 5 columns in total]
Score the model
score = model.score(X, y) score # doctest:+SKIP precision recall accuracy f1_score log_loss roc_auc 0 1.0 1.0 1.0 1.0 0.000004 1.0 1 rows × 6 columns
[1 rows x 6 columns in total]
| Parameters | |
|---|---|
| Name | Description | 
| optimize_strategy | str, default "auto_strategy"The strategy to train logistic regression models. Possible values are "auto_strategy" and "batch_gradient_descent". The two are equilevant since "auto_strategy" will fall back to "batch_gradient_descent". The API is kept for consistency. Default to "auto_strategy". | 
| fit_intercept | default TrueDefault True. Specifies if a constant (a.k.a. bias or intercept) should be added to the decision function. | 
| class_weight | dict or 'balanced', default NoneDefault None. Weights associated with classes in the form  | 
| l1_reg | float or None, default NoneThe amount of L1 regularization applied. Default to None. Can't be set in "normal_equation" mode. If unset, value 0 is used. | 
| l2_reg | float, default 0.0The amount of L2 regularization applied. Default to 0. | 
| max_iterations | int, default 20The maximum number of training iterations or steps. Default to 20. | 
| warm_start | bool, default FalseDetermines whether to train a model with new training data, new model options, or both. Unless you explicitly override them, the initial options used to train the model are used for the warm start run. Default to False. | 
| learning_rate | float or None, default NoneThe learn rate for gradient descent when learning_rate_strategy='constant'. If unset, value 0.1 is used. If learning_rate_strategy='line_search', an error is returned. | 
| learning_rate_strategy | str, default "line_search"The strategy for specifying the learning rate during training. Default to "line_search". | 
| tol | float, default 0.01The minimum relative loss improvement that is necessary to continue training when EARLY_STOP is set to true. For example, a value of 0.01 specifies that each iteration must reduce the loss by 1% for training to continue. Default to 0.01. | 
| ls_init_learning_rate | float or None, default NoneSets the initial learning rate that learning_rate_strategy='line_search' uses. This option can only be used if line_search is specified. If unset, value 0.1 is used. | 
| calculate_p_values | bool, default FalseSpecifies whether to compute p-values and standard errors during training. Default to False. | 
| enable_global_explain | bool, default FalseWhether to compute global explanations using explainable AI to evaluate global feature importance to the model. Default to False. |