Skip to content Skip to sidebar Skip to footer

Removing Stopwords From List Of Lists

I would like to know how I can remove specific words, including stopwords, from a list of list like this: my_list=[[], [], ['A'], ['SB'], [], ['NMR'], [], ['ISSN'], [], []

Solution 1:

Loop through my_words, replacing each nested list with the list with stop words removed. You can use set difference to remove the words.

stop_words = stopwords.words('english')
my_list = [list(set(sublist).difference(stop_words)) for sublist in my_list]

It gets a little more complicated to do the comparisons case insensitively, as you can't use the built-in set difference method.

my_list = [[word for word in sublist if word.lower() not in stop_words] for sublist in my_list]

Post a Comment for "Removing Stopwords From List Of Lists"