Skip to content

Commit e1d2b6d

Browse files
committed
Merge pull request luispedro#6 from re4lfl0w/unicodedecodeerror_fix
Modify UnicodeDecodeError text. Use utf-8 in case of Python version is 2.x.
2 parents 9cebdb7 + 4de7593 commit e1d2b6d

File tree

1 file changed

+3
-0
lines changed

1 file changed

+3
-0
lines changed

ch05/classify.py

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -54,6 +54,9 @@ def prepare_sent_features():
5454
if not text:
5555
meta[pid]['AvgSentLen'] = meta[pid]['AvgWordLen'] = 0
5656
else:
57+
from platform import python_version
58+
if python_version().startswith('2'):
59+
text = text.decode('utf-8')
5760
sent_lens = [len(nltk.word_tokenize(
5861
sent)) for sent in nltk.sent_tokenize(text)]
5962
meta[pid]['AvgSentLen'] = np.mean(sent_lens)

0 commit comments

Comments
 (0)