Pandas Affects Results Of Rapidfuzz Match?

May 30, 2024 Post a Comment

I am hitting a wall with this. Rapidfuzz delivers different results for string score similarity if I run it within a pandas dataframe and if I run it by itself? Why the results for

Solution 1:

The error comes from the fact that you call the entire column when applying fuzz. If you do the following thing, which is to apply fuzz to the individual row, you get the same result:

test_anui= test_anui[(test_anui['Address Similarity'].isnull()) & (test_anui['Address Similarity']!='')]
test_anui['Address Similarity 2'] = fuzz.token_sort_ratio(str(test_anui.at[0,'Processed Client Address']), str(test_anui.at[0,'Processed Aruvio Address']))

print('the address similarity is different? ', fuzz.token_sort_ratio(address_a, address_b))

alternatively, using .loc

test_anui= test_anui[(test_anui['Address Similarity'].isnull()) & (test_anui['Address Similarity']!='')]
test_anui['Address Similarity 2'] = fuzz.token_sort_ratio(str(test_anui.loc[0,'Processed Client Address']), str(test_anui.loc[0,'Processed Aruvio Address']))

print('the address similarity is different? ', fuzz.token_sort_ratio(address_a, address_b))

The output in the dataframe is:

    Processed Client Name         Processed Aruvio Name  \
0  anhui jinhan clothing co ltd  anhui jinhan clothing co ltd   

                            Processed Client Address  \
0  high new technology development zones huainan ...   

        Processed Aruvio Address  Name Similarity  Address Similarity  \
0  industrial park of funan city        89.285714                 NaN   

   Address Similarity 2028.099174

and of fuzz.token_sort_ratio(address_a, address_b) is 28.099173553719012.

In other words, you need to specify which row you are intending on extracting strings from. I suppose your dataframe consists of several rows, which means you'll have to do this for each row:

for i in len(test_anui):
    test_anui['Address Similarity 2'] = fuzz.token_sort_ratio(str(test_anui.loc[i,'Processed Client Address']), 
    str(test_anui.loc[i,'Processed Aruvio Address']))

Python Library

Pandas Affects Results Of Rapidfuzz Match?

Solution 1:

Post a Comment for "Pandas Affects Results Of Rapidfuzz Match?"