Unicode Warning When Using Nltk Stopwords With Tfidfvectorizer Of Scikit-learn

June 22, 2024 Post a Comment

I am trying to use the Tf-idf Vectorizer from scikit-learn, using the spanish stopwords from NLTK: from nltk.corpus import stopwords vectorizer = TfidfVectorizer(stop_words=stopwo

Solution 1:

Actually the problem was more easy to solve than I thought. The issue here is that NLTK does not return unicode object, but str objects. So I needed to decode them from utf-8 before using them:

stopwords = [word.decode('utf-8') for word in stopwords.words('spanish')]

Python Library

Unicode Warning When Using Nltk Stopwords With Tfidfvectorizer Of Scikit-learn

Solution 1:

Post a Comment for "Unicode Warning When Using Nltk Stopwords With Tfidfvectorizer Of Scikit-learn"