Sample Maximum Possible Data Points From Distribution To New Distribution
Context Assume there is a distribution of three nominal classes over each calendar week from an elicitation, e.g. like this: | Week | Class | Count | Distribution | Desired Distrib
Solution 1:
You can try calculate the maximal total count for each week, then multiply that with the desired distribution. The idea is
- Devide the
Count
byDesired Distribution
to get the possible total - Calculate the minimal possible total for each week with
groupby
- Then multiply the possible totals with the
Desired Distribution
to get the sample numbers.
In code:
df['new_count'] = (df['Count'].div(df['Desired Distribution'])
.groupby(df['Week']).transform('min')
.mul(df['Desired Distribution'])
//1
).astype(int)
Output:
Week Class Count Distribution Desired Distribution new_count
0 1 A 954 0.36 0.55 954
1 1 B 554 0.21 0.29 503
2 1 C 1145 0.43 0.16 277
3 2 A 454 0.21 0.55 454
4 2 B 944 0.44 0.29 239
5 2 C 748 0.35 0.16 132
Post a Comment for "Sample Maximum Possible Data Points From Distribution To New Distribution"