DataFrame.drop_duplicates And DataFrame.drop Not Removing Rows

October 29, 2022 Post a Comment

I have read in a csv into a pandas dataframe and it has five columns. Certain rows have duplicate values only in the second column, i want to remove these rows from the dataframe b

Solution 1:

In my case the issue was that I was concatenating dfs with columns of different types:

import pandas as pd

s1 = pd.DataFrame([['a', 1]], columns=['letter', 'code'])
s2 = pd.DataFrame([['a', '1']], columns=['letter', 'code'])
df = pd.concat([s1, s2])
df = df.reset_index(drop=True)
df.drop_duplicates(inplace=True)

# 2 rows
print(df)

# int
print(type(df.at[0, 'code']))
# string
print(type(df.at[1, 'code']))

# Fix:
df['code'] = df['code'].astype(str)
df.drop_duplicates(inplace=True)

# 1 row
print(df)

Python Library

DataFrame.drop_duplicates And DataFrame.drop Not Removing Rows

Solution 1:

Post a Comment for "DataFrame.drop_duplicates And DataFrame.drop Not Removing Rows"