How To Split A Dataframe In Pandas In Predefined Percentages?
I have a pandas dataframe sorted by a number of columns. Now I'd like to split the dataframe in predefined percentages, so as to extract and name a few segments. For example, I wa
Solution 1:
Use numpy.split
:
a, b, c = np.split(df, [int(.2*len(df)), int(.5*len(df))])
Sample:
np.random.seed(100)
df = pd.DataFrame(np.random.random((20,5)), columns=list('ABCDE'))
#print (df)
a, b, c = np.split(df, [int(.2*len(df)), int(.5*len(df))])
print (a)
A B C D E
00.5434050.2783690.4245180.8447760.00471910.1215690.6707490.8258530.1367070.57509320.8913220.2092020.1853280.1083770.21969730.9786240.8116830.1719410.8162250.274074print (b)
A B C D E
40.4317040.9400300.8176490.3361120.17541050.3728320.0056890.2524260.7956630.01525560.5988430.6038050.1051480.3819430.03647670.8904120.9809210.0599420.8905460.57690180.7424800.6301840.5818420.0204390.21002790.5446850.7691150.2506950.2858960.852395print (c)
A B C D E
100.9750060.8848530.3595080.5988590.354796110.3401900.1780810.2376940.0448620.505431120.3762520.5928050.6299420.1426000.933841130.9463800.6022970.3877660.3631880.204345140.2767650.2465360.1736080.9666100.957013150.5979740.7313010.3403850.0920560.463498160.5086990.0884600.5280350.9921580.395036170.3355960.8054510.7543490.3130660.634037180.5404050.2967940.1107880.3126400.456979190.6589400.2542580.6411010.2001240.657625
Solution 2:
Solution 3:
I've written a simple function that does the job.
Maybe that might help you.
P.S:
- Sum of fractions must be 1.
It will return len(fracs) new dfs. so you can insert fractions list at long as you want (e.g: fracs=[0.1, 0.1, 0.3, 0.2, 0.2])
np.random.seed(100) df = pd.DataFrame(np.random.random((99,4))) defsplit_by_fractions(df:pd.DataFrame, fracs:list, random_state:int=42): assertsum(fracs)==1.0, 'fractions sum is not 1.0 (fractions_sum={})'.format(sum(fracs)) remain = df.index.copy().to_frame() res = [] for i inrange(len(fracs)): fractions_sum=sum(fracs[i:]) frac = fracs[i]/fractions_sum idxs = remain.sample(frac=frac, random_state=random_state).index remain=remain.drop(idxs) res.append(idxs) return [df.loc[idxs] for idxs in res] train,test,val = split_by_fractions(df, [0.8,0.1,0.1]) # e.g: [test, train, validation]print(train.shape, test.shape, val.shape)
outputs:
(79, 4) (10, 4) (10, 4)
Post a Comment for "How To Split A Dataframe In Pandas In Predefined Percentages?"