Skip to content Skip to sidebar Skip to footer

Want To Find Year On Year Calculation Using Groupby And Apply For Various Years

I have a dataframe as follows: MARKET PRODUCT TIMEPERIOD DATE VALUES 0 USA MARKET APPLE QUARTER 2020-06-01 100 1 USA MARKET APPLE YEARLY 2020-06-01 1000 2

Solution 1:

You can use itertools.combinations to get the year-year combination, together with further manipulation inside a function to be applied in the groups, like this:

import numpy as np
import pandas as pd
from itertools import combinations

defget_annual_growth(grp):
    # Get all possible combination of the years from dataset
    year_comb_lists = np.sort([sorted(comb) for comb in combinations(grp.DATE.dt.year, 2)])
    # Remove those combinations in which difference is greather than 1 (for example, 2018-2020)
    year_comb_lists = year_comb_lists[(np.diff(year_comb_lists) == 1).flatten()] # comment this line if it's not the case# Get year-combination labels
    year_comb_strings = ['-'.join(map(str, comb)) for comb in year_comb_lists]
    
    # Create sub-dataframe with to be concated afterwards by pandas `groupby`
    subdf = pd.DataFrame(columns=['Annual Reference', 'Annual Growth (%)'])
    for i,years inenumerate(year_comb_lists): # for each year combination ...
        actual_value, last_value = grp[grp.DATE.dt.year==years[1]].VALUES.mean(), grp[grp.DATE.dt.year==years[0]].VALUES.mean()
        growth = (actual_value - last_value) / last_value # calculate the annual growth
        subdf.loc[i, :] = [year_comb_strings[i], growth] 
    return subdf

df_2.groupby(['TIMEPERIOD','MARKET', 'PRODUCT']).apply(get_annual_growth)

Output:

AnnualReferenceAnnualGrowth(%)TIMEPERIODMARKETPRODUCTQUARTERUKMARKETGRAPE02019-202030012018-20190WATERMELON02019-2020-6012018-20190USAMARKETAPPLE02019-2020-66.6712018-20190PEAR02019-202010012018-20190YEARLYUKMARKETGRAPE02019-202090012018-20190WATERMELON02019-20201566.6712018-20190USAMARKETAPPLE02019-2020-5012018-20190PEAR02019-202066.6712018-20190

Solution 2:

Please find this approach.

df = df_2.groupby(['MARKET','TIMEPERIOD','PRODUCT'])['VALUES'].apply(list).reset_index()
deffunc(x):
    year = 2021for i inrange(1,len(x['VALUES'])):
        colname = str(year-i) + '-Growth'
        x[colname] = round(abs(x['VALUES'][i]- x['VALUES'][i-1])/x['VALUES'][i]*100,2)
    return x
df = df.apply(lambda x: func(x), axis=1).drop('VALUES',axis=1)
print(df)

Its a generalized code which should work for all previous years dating back to 2013 as mentioned in the comment.

Prints:

MARKETTIMEPERIODPRODUCT2020-Growth2019-Growth0UKMARKETQUARTERGRAPE300.000.01UKMARKETQUARTERWATERMELON60.000.02UKMARKETYEARLYGRAPE900.000.03UKMARKETYEARLYWATERMELON1566.67          0.04USAMARKETQUARTERAPPLE66.670.05USAMARKETQUARTERPEAR100.000.06USAMARKETYEARLYAPPLE50.000.07USAMARKETYEARLYPEAR66.670.0

Explanation:

First, I do a group by on the values and put it into a list:

df_2.groupby(['MARKET','TIMEPERIOD','PRODUCT'])['VALUES'].apply(list).reset_index()

for e.g

       MARKET TIMEPERIOD     PRODUCT              VALUES
0   UK MARKET    QUARTER       GRAPE       [200, 50, 50]1   UK MARKET    QUARTER  WATERMELON     [200, 500, 500]
....

Then, I write an apply to loop through the 'VALUES' list column and do the growth calculation.

Solution 3:

I made some changes to Caina Max's answer to accommodate to my real data, where there are various months in a year. There can be 2020-06-01, 2020-03-01, 2019-12-01 etc and hence I have to make the below changes to get the combination pairs of the dates who are 1 year apart exactly, namely, [2019-06-01, 2020-06-01], [2019-03-01, 2020-03-01], [2018-12-01, 2019-12-01] etc etc.

import numpy as np
import pandas as pd
from itertools import combinations

defget_annual_growth(grp):
    # Get all possible combination of the years from dataset
    year_comb_lists = np.sort([sorted(comb) for comb in combinations(grp.DATE, 2)])
    new_year_comb_lists = [comb_dates for comb_dates in year_comb_lists if comb_dates[0]==comb_dates[1]-relativedelta(months=12)]

    # Get year-combination labels
    year_comb_strings = [comb[1] for comb in new_year_comb_lists]
    
    # Create sub-dataframe with to be concated afterwards by pandas `groupby`
    subdf = pd.DataFrame(columns=['Annual Reference', 'Annual Growth (%)'])
    for i,years inenumerate(new_year_comb_lists ): # for each year combination ...
        actual_value, last_value = grp[grp['Date']==years[1]].Values.mean(), grp[grp['Date']==years[0]].Values.mean()
        growth = (actual_value - last_value) / last_value # calculate the annual growth
        subdf.loc[i, :] = [year_comb_strings[i], growth] 
    return subdf

df_2.groupby(['TIMEPERIOD','MARKET', 'PRODUCT']).apply(get_annual_growth)
df_2= df_2.reset_index()
df_2['Annual_Reference'] = pd.to_datetime(df_2['Annual_Reference'])

Post a Comment for "Want To Find Year On Year Calculation Using Groupby And Apply For Various Years"