Skip to content Skip to sidebar Skip to footer

Conditional Running Count In Pandas For All Previous Rows Only

Suppose I have the following DataFrame: df = pd.DataFrame({'Event': ['A', 'B', 'A', 'A', 'B', 'C', 'B', 'B', 'A', 'C'], 'Date': ['2019-01-01', '2019-02-01', '2

Solution 1:

groupby + rank

Dates can be treated as numeric. Use'min' to get your counting logic.

df['PEC']=(df.groupby('Event').Date.rank(method='min')-1).astype(int)EventDatePEC0A2019-01-01    01B2019-02-01    02A2019-03-01    13A2019-03-01    14B2019-02-15    15C2019-03-15    06B2019-04-05    27B2019-04-05    28A2019-04-15    39C2019-06-10    1

Solution 2:

First get counts by GroupBy.size per both columns, then aggregate by first level with shift and cumulative sum and last join to original:

s=(df.groupby(['Event','Date']).size().groupby(level=0).apply(lambdax:x.shift(1).cumsum()).fillna(0).astype(int))df=df.join(s.rename('Previous_Event_Count'),on=['Event','Date'])print(df)EventDatePrevious_Event_Count0A2019-01-01                     01B2019-02-01                     02A2019-03-01                     13A2019-03-01                     14B2019-02-15                     15C2019-03-15                     06B2019-04-05                     27B2019-04-05                     28A2019-04-15                     39C2019-06-10                     1

Solution 3:

Finally, I can find a better and faster way to get the desired result. It turns out that it is very easy. One can try:

df['Total_Previous_Sale'] = df.groupby('Event').cumcount() \
                          - df.groupby(['Event', 'Date']).cumcount()

Post a Comment for "Conditional Running Count In Pandas For All Previous Rows Only"