Skip to content Skip to sidebar Skip to footer

Python Pandas: Assign Last Value Of Dataframe Group To All Entries Of That Group

In Python Pandas, I have a DataFrame. I group this DataFrame by a column and want to assign the last value of a column to all rows of another column. I know that I am able to selec

Solution 1:

Use transform with last:

df['b_new'] = df.groupby('a')['b'].transform('last')

Alternative:

df['b_new'] = df.groupby('a')['b'].transform(lambda x: x.iat[-1])

print(df)
   a   b  b_new
0  1  20     21
1  1  21     21
2  2  30     30
3  3  40     41
4  3  41     41

Solution with nth and join:

df = df.join(df.groupby('a')['b'].nth(-1).rename('b_new'), 'a')
print(df)
   a   b  b_new
0  1  20     21
1  1  21     21
2  2  30     30
3  3  40     41
4  3  41     41

Timings:

N = 10000

df = pd.DataFrame({'a':np.random.randint(1000,size=N),
                   'b':np.random.randint(10000,size=N)})

#print (df)


def f(df):
    return df.join(df.groupby('a')['b'].nth(-1).rename('b_new'), 'a')

#cᴏʟᴅsᴘᴇᴇᴅ1
In [211]: %timeit df['b_new'] = df.a.map(df.groupby('a').b.nth(-1))
100 loops, best of 3: 3.57 ms per loop

#cᴏʟᴅsᴘᴇᴇᴅ2
In [212]: %timeit df['b_new'] = df.a.replace(df.groupby('a').b.nth(-1))
10 loops, best of 3: 71.3 ms per loop

#jezrael1
In [213]: %timeit df['b_new'] = df.groupby('a')['b'].transform('last')
1000 loops, best of 3: 1.82 ms per loop

#jezrael2
In [214]: %timeit df['b_new'] = df.groupby('a')['b'].transform(lambda x: x.iat[-1])
10 loops, best of 3: 178 ms per loop
    
#jezrael3
In [219]: %timeit f(df)
100 loops, best of 3: 3.63 ms per loop

Caveat

The results do not address performance given the number of groups, which will affect timings a lot for some of these solutions.

Solution 2:

Two possibilities, with groupby + nth + map or replace

df['b_new'] = df.a.map(df.groupby('a').b.nth(-1))

Or,

df['b_new'] = df.a.replace(df.groupby('a').b.nth(-1))

You can also replace nth(-1) with last() (in fact, doing so happens to make this a little faster), but nth gives you more flexibility over what item to pick from each group in b.


df

   ab  b_new
012021112121223030334041434141

Solution 3:

I think this should be fast

df.merge(df.drop_duplicates('a',keep='last'),on='a',how='left')
Out[797]: 
   ab_xb_y012021112121223030334041434141

Post a Comment for "Python Pandas: Assign Last Value Of Dataframe Group To All Entries Of That Group"