Tfidf Calculating Confusion

November 23, 2023 Post a Comment

I found the following code on the internet for calculating TFIDF: https://github.com/timtrueman/tf-idf/blob/master/tf-idf.py I added '1+' in the function def idf(word, documentLis

Solution 1:

No. Tf-idf is tf, a non-negative value, times idf, a non-negative value, so it can never be negative. This code seems to be implementing the erroneous definition of tf-idf that's been on the Wikipedia for years (it's been fixed in the meantime).

Solution 2:

If the word in question is contained in every document in the collection your 1+ change will result in a negative value. As 0 < (x / (1 + x)) < 1 holds for all x > 0. Which results in a negative logarithm.

In my opinion the correct IDF for a nonexistent word is infinite or undefined, but by adding 1+ to the denominator and the nominator a nonexistent word will have an IDF slightly higher than any existing word and words that exist in every document will have an IDF of zero. Both cases will probably work well with your code.

Python Library

Tfidf Calculating Confusion

Solution 1:

Solution 2:

Post a Comment for "Tfidf Calculating Confusion"