[Django]-Python equivalent to wordpress sanitize_text

2👍

The python-slugify library has a stopwords parameter which can be used in conjunction with nltk as follows:

from slugify import slugify
from nltk.corpus import stopwords

text = 'mygubbi raises $25 mn seed funding from bigbasket co founder others'
print slugify(text, stopwords=stopwords.words('english'))

This would print:

mygubbi-raises-25-mn-seed-funding-bigbasket-co-founder-others

After installing nltk you can install additional corpora, one of which are the stopwords. To do this run their built in download utility as follows:

import nltk

nltk.download()

NLTK download helper

Select Corpora, scroll down to stopwords and click the Download button.

1👍

There is a python module called nltk. This offers you the possibility to do exactly this.

http://www.bogotobogo.com/python/NLTK/tokenization_tagging_NLTK.php

Just scroll down a little on this website to find the headline “Removing Stop Words”. There are examples of how to do this using this module.

Leave a comment