Skip to content

Splitting the text in scoring #113

Open
@haziyevv

Description

@haziyevv

In the score_paragraphs method content score is calculated like this:
content_score += len(inner_text.split(','))

But I think it should be like below, because there may be no comma in a text.
content_score += len(re.split(' |,',inner_text))

Also I think this may be added: Do not take into account non words and words with length less than 3
inner_text = " ".join(re.findall("[^\d\W]{3,}", inner_text))

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions