@@ -132,8 +132,8 @@ which is fast to train and achieves a decent F-score::
132132 >>> clf = MultinomialNB(alpha=.01)
133133 >>> clf.fit(vectors, newsgroups_train.target)
134134 >>> pred = clf.predict(vectors_test)
135- >>> metrics.f1_score(newsgroups_test.target, pred, average='weighted ')
136- 0.88251152461278892
135+ >>> metrics.f1_score(newsgroups_test.target, pred, average='macro ')
136+ 0.88213592402729568
137137
138138(The example :ref: `example_text_document_classification_20newsgroups.py ` shuffles
139139the training and test data, instead of segmenting by time, and in that case
@@ -182,8 +182,8 @@ blocks, and quotation blocks respectively.
182182 ... categories= categories)
183183 >>> vectors_test = vectorizer.transform(newsgroups_test.data)
184184 >>> pred = clf.predict(vectors_test)
185- >>> metrics.f1_score(pred, newsgroups_test.target, average = ' weighted ' )
186- 0.78409163025839435
185+ >>> metrics.f1_score(pred, newsgroups_test.target, average = ' macro ' )
186+ 0.77310350681274775
187187
188188This classifier lost over a lot of its F-score, just because we removed
189189metadata that has little to do with topic classification.
@@ -193,12 +193,12 @@ It loses even more if we also strip this metadata from the training data:
193193 ... remove= (' headers' , ' footers' , ' quotes' ),
194194 ... categories= categories)
195195 >>> vectors = vectorizer.fit_transform(newsgroups_train.data)
196- >>> clf = BernoulliNB (alpha = .01 )
196+ >>> clf = MultinomialNB (alpha = .01 )
197197 >>> clf.fit(vectors, newsgroups_train.target)
198198 >>> vectors_test = vectorizer.transform(newsgroups_test.data)
199199 >>> pred = clf.predict(vectors_test)
200- >>> metrics.f1_score(newsgroups_test.target, pred, average = ' weighted ' )
201- 0.73160869205141166
200+ >>> metrics.f1_score(newsgroups_test.target, pred, average = ' macro ' )
201+ 0.76995175184521725
202202
203203Some other classifiers cope better with this harder version of the task. Try
204204running :ref: `example_model_selection_grid_search_text_feature_extraction.py ` with and without
0 commit comments