Skip to content Skip to sidebar Skip to footer

Sample Maximum Possible Data Points From Distribution To New Distribution

Context Assume there is a distribution of three nominal classes over each calendar week from an elicitation, e.g. like this: | Week | Class | Count | Distribution | Desired Distrib

Solution 1:

You can try calculate the maximal total count for each week, then multiply that with the desired distribution. The idea is

  1. Devide the Count by Desired Distribution to get the possible total
  2. Calculate the minimal possible total for each week with groupby
  3. Then multiply the possible totals with the Desired Distribution to get the sample numbers.

In code:

df['new_count'] = (df['Count'].div(df['Desired Distribution'])
    .groupby(df['Week']).transform('min')
    .mul(df['Desired Distribution'])
    //1
).astype(int)

Output:

   Week Class  Count  Distribution  Desired Distribution  new_count
0     1     A    954          0.36                  0.55        954
1     1     B    554          0.21                  0.29        503
2     1     C   1145          0.43                  0.16        277
3     2     A    454          0.21                  0.55        454
4     2     B    944          0.44                  0.29        239
5     2     C    748          0.35                  0.16        132

Post a Comment for "Sample Maximum Possible Data Points From Distribution To New Distribution"