Pandas Dataframe: Aggregate Values Within Blocks Of Repeating Ids
Given a DataFrame with an ID column and corresponding values column, how can I aggregate (let's say sum) the values within blocks of repeating IDs? Example DF: import numpy as np i
Solution 1:
Here is necessary create helper Series
with compare shifted values for not equal by ne
with cumulative sums and pass to groupby
, for id
column is possible pass together in list, remove first level of MultiIndex by first reset_index(level=0, drop=True)
and then convert index to column id
:
print (df['id'].ne(df['id'].shift()).cumsum())
0 1
1 1
2 1
3 1
4 1
5 2
6 2
7 2
8 3
9 3
10 4
11 5
12 6
13 6
14 6
Name: id, dtype: int32
df1 = (df.groupby([df['id'].ne(df['id'].shift()).cumsum(), 'id'])['v'].sum()
.reset_index(level=0, drop=True)
.reset_index())
print (df1)
id v
0 a 5.0
1 b 3.0
2 a 2.0
3 b 1.0
4 a 1.0
5 b 3.0
Another idea is useGroupBy.agg
with dictioanry and aggregate id
column by GroupBy.first
:
df1 = (df.groupby(df['id'].ne(df['id'].shift()).cumsum(), as_index=False)
.agg({'id':'first', 'v':'sum'}))
Post a Comment for "Pandas Dataframe: Aggregate Values Within Blocks Of Repeating Ids"