DataFrame.drop_duplicates And DataFrame.drop Not Removing Rows
I have read in a csv into a pandas dataframe and it has five columns. Certain rows have duplicate values only in the second column, i want to remove these rows from the dataframe b
Solution 1:
In my case the issue was that I was concatenating dfs with columns of different types:
import pandas as pd
s1 = pd.DataFrame([['a', 1]], columns=['letter', 'code'])
s2 = pd.DataFrame([['a', '1']], columns=['letter', 'code'])
df = pd.concat([s1, s2])
df = df.reset_index(drop=True)
df.drop_duplicates(inplace=True)
# 2 rows
print(df)
# int
print(type(df.at[0, 'code']))
# string
print(type(df.at[1, 'code']))
# Fix:
df['code'] = df['code'].astype(str)
df.drop_duplicates(inplace=True)
# 1 row
print(df)
Post a Comment for "DataFrame.drop_duplicates And DataFrame.drop Not Removing Rows"