-
-
Notifications
You must be signed in to change notification settings - Fork 26.4k
Description
Description
Incorrect predictions when fitting a LogisticRegression model on binary outcomes with multi_class='multinomial'.
Steps/Code to Reproduce
from sklearn.linear_model import LogisticRegression
import sklearn.metrics
import numpy as np
# Set up a logistic regression object
lr = LogisticRegression(C=1000000, multi_class='multinomial',
solver='sag', tol=0.0001, warm_start=False,
verbose=0)
# Set independent variable values
Z = np.array([
[ 0. , 0. ],
[ 1.33448632, 0. ],
[ 1.48790105, -0.33289528],
[-0.47953866, -0.61499779],
[ 1.55548163, 1.14414766],
[-0.31476657, -1.29024053],
[-1.40220786, -0.26316645],
[ 2.227822 , -0.75403668],
[-0.78170885, -1.66963585],
[ 2.24057471, -0.74555021],
[-1.74809665, 2.25340192],
[-1.74958841, 2.2566389 ],
[ 2.25984734, -1.75106702],
[ 0.50598996, -0.77338402],
[ 1.21968303, 0.57530831],
[ 1.65370219, -0.36647173],
[ 0.66569897, 1.77740068],
[-0.37088553, -0.92379819],
[-1.17757946, -0.25393047],
[-1.624227 , 0.71525192]])
# Set dependant variable values
Y = np.array([1, 0, 0, 1, 0, 0, 0, 0,
0, 0, 1, 1, 1, 0, 0, 1,
0, 0, 1, 1], dtype=np.int32)
lr.fit(Z, Y)
p = lr.predict_proba(Z)
print(sklearn.metrics.log_loss(Y, p)) # ...
print(lr.intercept_)
print(lr.coef_)Expected Results
If we compare against R or using multi_class='ovr', the log loss (which is approximately proportional to the objective function as the regularisation is set to be negligible through the choice of C) is incorrect. We expect the log loss to be roughly 0.5922995
Actual Results
The actual log loss when using multi_class='multinomial' is 0.61505641264.
Further Information
See the stack exchange question https://stats.stackexchange.com/questions/306886/confusing-behaviour-of-scikit-learn-logistic-regression-multinomial-optimisation?noredirect=1#comment583412_306886 for more information.
The issue it seems is caused in https://github.com/scikit-learn/scikit-learn/blob/ef5cb84a/sklearn/linear_model/logistic.py#L762. In the multinomial case even if classes.size==2 we cannot reduce to a 1D case by throwing away one of the vectors of coefficients (as we can in normal binary logistic regression). This is essentially a difference between softmax (redundancy allowed) and logistic regression.
This can be fixed by commenting out the lines 762 and 763. I am apprehensive however that this may cause some other unknown issues which is why I am positing as a bug.
Versions
Linux-4.10.0-33-generic-x86_64-with-Ubuntu-16.04-xenial
Python 3.5.2 (default, Nov 17 2016, 17:05:23)
NumPy 1.13.1
SciPy 0.19.1
Scikit-Learn 0.19.0