Skip to content Skip to sidebar Skip to footer

Slicing A Dask Dataframe

I have the following code where I like to do a train/test split on a Dask dataframe df = dd.read_csv(csv_filename, sep=',', encoding='latin-1', names=cols, he

Solution 1:

Dask.dataframe doesn't support row-wise slicing. It does support the loc operation if you have a sensible index.

However in your case of train/test splitting you will probably be better served by the random_split method.

train, test = df.random_split([0.80, 0.20])

You could also make many splits and concat in different ways

splits = df.random_split([0.20, 0.20, 0.20, 0.20, 0.20])

for i inrange(5):
    trains = [splits[j] for j inrange(5) if j != i]
    train = dd.concat(trains, axis=0)
    test = splits[i]

Post a Comment for "Slicing A Dask Dataframe"