Skip to content Skip to sidebar Skip to footer

How To Aggregate Only The Numerical Columns In A Mixed Dtypes Dataframe

I have a mixed pd.DataFrame: import pandas as pd import numpy as np df = pd.DataFrame({ 'A' : 1., 'B' : pd.Timestamp('20130102'), 'C' : pd

Solution 1:

By using select_dtypes:

df.groupby(list(df.select_dtypes(exclude=[np.number]))).agg(np.median).reset_index()

Or something like this:

df1 = df.groupby('B',as_index=False).agg(np.median)
pd.concat([df1,df.drop_duplicates(['B']).drop(list(df1),1).reset_index(drop=True)],axis=1)

Solution 2:

If 'C', 'F' are the same for each value of 'B', then you can include it in the groupby columns, like this:

df.groupby(['B','C','F']).agg(np.median).reset_index()

Or as @BradSolomn suggests:

df.groupby(['B','C','F'], as_index=False).agg(np.median)

Output:

           B          C    F    A         D
0 2013-01-02 2018-01-01  foo  1.0  0.392723

If not, then you'll need to aggregrate 'C', 'F' somehow for example get the get the first value from 'C', 'F'

df.groupby('B').agg({'D':np.median,'A':np.median,'C':'first','F':'last'}).reset_index() 

           B          C    F    A         D
0 2013-01-02 2018-01-01  foo  1.0  0.392723

Post a Comment for "How To Aggregate Only The Numerical Columns In A Mixed Dtypes Dataframe"