Want To Find Year On Year Calculation Using Groupby And Apply For Various Years
Solution 1:
You can use itertools.combinations
to get the year-year combination, together with further manipulation inside a function to be applied in the groups, like this:
import numpy as np
import pandas as pd
from itertools import combinations
defget_annual_growth(grp):
# Get all possible combination of the years from dataset
year_comb_lists = np.sort([sorted(comb) for comb in combinations(grp.DATE.dt.year, 2)])
# Remove those combinations in which difference is greather than 1 (for example, 2018-2020)
year_comb_lists = year_comb_lists[(np.diff(year_comb_lists) == 1).flatten()] # comment this line if it's not the case# Get year-combination labels
year_comb_strings = ['-'.join(map(str, comb)) for comb in year_comb_lists]
# Create sub-dataframe with to be concated afterwards by pandas `groupby`
subdf = pd.DataFrame(columns=['Annual Reference', 'Annual Growth (%)'])
for i,years inenumerate(year_comb_lists): # for each year combination ...
actual_value, last_value = grp[grp.DATE.dt.year==years[1]].VALUES.mean(), grp[grp.DATE.dt.year==years[0]].VALUES.mean()
growth = (actual_value - last_value) / last_value # calculate the annual growth
subdf.loc[i, :] = [year_comb_strings[i], growth]
return subdf
df_2.groupby(['TIMEPERIOD','MARKET', 'PRODUCT']).apply(get_annual_growth)
Output:
AnnualReferenceAnnualGrowth(%)TIMEPERIODMARKETPRODUCTQUARTERUKMARKETGRAPE02019-202030012018-20190WATERMELON02019-2020-6012018-20190USAMARKETAPPLE02019-2020-66.6712018-20190PEAR02019-202010012018-20190YEARLYUKMARKETGRAPE02019-202090012018-20190WATERMELON02019-20201566.6712018-20190USAMARKETAPPLE02019-2020-5012018-20190PEAR02019-202066.6712018-20190
Solution 2:
Please find this approach.
df = df_2.groupby(['MARKET','TIMEPERIOD','PRODUCT'])['VALUES'].apply(list).reset_index()
deffunc(x):
year = 2021for i inrange(1,len(x['VALUES'])):
colname = str(year-i) + '-Growth'
x[colname] = round(abs(x['VALUES'][i]- x['VALUES'][i-1])/x['VALUES'][i]*100,2)
return x
df = df.apply(lambda x: func(x), axis=1).drop('VALUES',axis=1)
print(df)
Its a generalized code which should work for all previous years dating back to 2013 as mentioned in the comment.
Prints:
MARKETTIMEPERIODPRODUCT2020-Growth2019-Growth0UKMARKETQUARTERGRAPE300.000.01UKMARKETQUARTERWATERMELON60.000.02UKMARKETYEARLYGRAPE900.000.03UKMARKETYEARLYWATERMELON1566.67 0.04USAMARKETQUARTERAPPLE66.670.05USAMARKETQUARTERPEAR100.000.06USAMARKETYEARLYAPPLE50.000.07USAMARKETYEARLYPEAR66.670.0
Explanation:
First, I do a group by on the values and put it into a list:
df_2.groupby(['MARKET','TIMEPERIOD','PRODUCT'])['VALUES'].apply(list).reset_index()
for e.g
MARKET TIMEPERIOD PRODUCT VALUES
0 UK MARKET QUARTER GRAPE [200, 50, 50]1 UK MARKET QUARTER WATERMELON [200, 500, 500]
....
Then, I write an apply to loop through the 'VALUES' list column and do the growth calculation.
Solution 3:
I made some changes to Caina Max's answer to accommodate to my real data, where there are various months in a year. There can be 2020-06-01, 2020-03-01, 2019-12-01 etc and hence I have to make the below changes to get the combination pairs of the dates who are 1 year apart exactly, namely, [2019-06-01, 2020-06-01], [2019-03-01, 2020-03-01], [2018-12-01, 2019-12-01] etc etc.
import numpy as np
import pandas as pd
from itertools import combinations
defget_annual_growth(grp):
# Get all possible combination of the years from dataset
year_comb_lists = np.sort([sorted(comb) for comb in combinations(grp.DATE, 2)])
new_year_comb_lists = [comb_dates for comb_dates in year_comb_lists if comb_dates[0]==comb_dates[1]-relativedelta(months=12)]
# Get year-combination labels
year_comb_strings = [comb[1] for comb in new_year_comb_lists]
# Create sub-dataframe with to be concated afterwards by pandas `groupby`
subdf = pd.DataFrame(columns=['Annual Reference', 'Annual Growth (%)'])
for i,years inenumerate(new_year_comb_lists ): # for each year combination ...
actual_value, last_value = grp[grp['Date']==years[1]].Values.mean(), grp[grp['Date']==years[0]].Values.mean()
growth = (actual_value - last_value) / last_value # calculate the annual growth
subdf.loc[i, :] = [year_comb_strings[i], growth]
return subdf
df_2.groupby(['TIMEPERIOD','MARKET', 'PRODUCT']).apply(get_annual_growth)
df_2= df_2.reset_index()
df_2['Annual_Reference'] = pd.to_datetime(df_2['Annual_Reference'])
Post a Comment for "Want To Find Year On Year Calculation Using Groupby And Apply For Various Years"