Python PANDAS: Resampling Multivariate Time Series With A Groupby
I have data in the following general format that I would like to resample to 30 day time series windows: 'customer_id','transaction_dt','product','price','units' 1,2004-01-02,thi
Solution 1:
Edited for new solution. I think you can convert each of the transaction_dt
to a Period object of 30 days and then do the grouping.
p = pd.period_range('2004-1-1', '12-31-2018',freq='30D')
def find_period(v):
p_idx = np.argmax(v < p.end_time)
return p[p_idx]
df['period'] = df['transaction_dt'].apply(find_period)
df
customer_id transaction_dt product price units period
0 1 2004-01-02 thing1 25 47 2004-01-01
1 1 2004-01-17 thing2 150 8 2004-01-01
2 2 2004-01-29 thing2 150 25 2004-01-01
3 3 2017-07-15 thing3 55 17 2017-06-21
4 3 2016-05-12 thing3 55 47 2016-04-27
5 4 2012-02-23 thing2 150 22 2012-02-18
6 4 2009-10-10 thing1 25 12 2009-10-01
7 4 2014-04-04 thing2 150 2 2014-03-09
8 5 2008-07-09 thing2 150 43 2008-07-08
We can now use this dataframe to get the concatenation of products, weighted average of price and sum of units. We then use some of the Period functionality to get the end time.
def my_funcs(df):
data = {}
data['product'] = '/'.join(df['product'].tolist())
data['units'] = df.units.sum()
data['price'] = np.average(df['price'], weights=df['units'])
data['transaction_dt'] = df['transaction_dt'].iloc[0]
data['window_start_time'] = df['period'].iloc[0].start_time
data['window_end_time'] = df['period'].iloc[0].end_time
return pd.Series(data, index=['transaction_dt', 'product', 'price','units',
'window_start_time', 'window_end_time'])
df.groupby(['customer_id', 'period']).apply(my_funcs).reset_index('period', drop=True)
Post a Comment for "Python PANDAS: Resampling Multivariate Time Series With A Groupby"