Open
Description
In the score_paragraphs method content score is calculated like this:
content_score += len(inner_text.split(','))
But I think it should be like below, because there may be no comma in a text.
content_score += len(re.split(' |,',inner_text))
Also I think this may be added: Do not take into account non words and words with length less than 3
inner_text = " ".join(re.findall("[^\d\W]{3,}", inner_text))
Metadata
Metadata
Assignees
Labels
No labels