Skip to content Skip to sidebar Skip to footer

Cumulative Sum Using 2 Columns

I am trying to create a column that does a cumulative sum using 2 columns , please see example of what I am trying to do :@Faith Akici index lodgement_year words sum cum

Solution 1:

You are almost there, Ian!

cumsum() method calculates the cumulative sum of a Pandas column. You are looking for that applied to the grouped words. Therefore:

In [303]: df_2['cumsum'] = df_2.groupby(['words'])['sum'].cumsum()

In [304]: df_2
Out[304]: 
   index  lodgement_year      words  sum  cum_sum  cumsum
0      0            2000        the   14       14      14
1      1            2000  australia   10       10      10
2      2            2000       word   12       12      12
3      3            2000      brand    8        8       8
4      4            2000      fresh    5        5       5
5      5            2001        the    8       22      22
6      6            2001  australia    3       13      13
7      7            2001     banana    1        1       1
8      8            2001      brand    7       15      15
9      9            2001      fresh    1        6       6

Please comment if this fails on your bigger data set, and we'll work on a possibly more accurate version of this.


Solution 2:

If we only need to consider the column 'words', we might need to loop through unique values of the words

for unique_words in df_2.words.unique():
    if 'cum_sum' not in df_2:
        df_2['cum_sum'] = df_2.loc[df_2['words'] == unique_words]['sum'].cumsum()
    else:
        df_2.update(pd.DataFrame({'cum_sum': df_2.loc[df_2['words'] == unique_words]['sum'].cumsum()}))

above will result to:

>>> print(df_2)
  lodgement_year  sum      words  cum_sum
0           2000   14        the     14.0
1           2000   10  australia     10.0
2           2000   12       word     12.0
3           2000    8      brand      8.0
4           2000    5      fresh      5.0
5           2001    8        the     22.0
6           2001    3  australia     13.0
7           2001    1     banana      1.0
8           2001    7      brand     15.0
9           2001    1      fresh      6.0

Post a Comment for "Cumulative Sum Using 2 Columns"