Conditional Running Count In Pandas For All Previous Rows Only
Suppose I have the following DataFrame: df = pd.DataFrame({'Event': ['A', 'B', 'A', 'A', 'B', 'C', 'B', 'B', 'A', 'C'], 'Date': ['2019-01-01', '2019-02-01', '2
Solution 1:
groupby
+ rank
Dates can be treated as numeric. Use'min'
to get your counting logic.
df['PEC']=(df.groupby('Event').Date.rank(method='min')-1).astype(int)EventDatePEC0A2019-01-01 01B2019-02-01 02A2019-03-01 13A2019-03-01 14B2019-02-15 15C2019-03-15 06B2019-04-05 27B2019-04-05 28A2019-04-15 39C2019-06-10 1
Solution 2:
First get counts by GroupBy.size
per both columns, then aggregate by first level with shift
and cumulative sum and last join
to original:
s=(df.groupby(['Event','Date']).size().groupby(level=0).apply(lambdax:x.shift(1).cumsum()).fillna(0).astype(int))df=df.join(s.rename('Previous_Event_Count'),on=['Event','Date'])print(df)EventDatePrevious_Event_Count0A2019-01-01 01B2019-02-01 02A2019-03-01 13A2019-03-01 14B2019-02-15 15C2019-03-15 06B2019-04-05 27B2019-04-05 28A2019-04-15 39C2019-06-10 1
Post a Comment for "Conditional Running Count In Pandas For All Previous Rows Only"