Filling Missing Data By Random Choosing From Non Missing Values In Pandas Dataframe
I have a pandas data frame where there are a several missing values. I noticed that the non missing values are close to each other. Thus, I would like to impute the missing values
Solution 1:
You can use pandas.fillna
method and the random.choice
method to fill the missing values with a random selection of a particular column.
import random
import numpy as np
df["column"].fillna(lambda x: random.choice(df[df[column] != np.nan]["column"]), inplace =True)
Where column is the column you want to fill with non nan
values randomly.
Solution 2:
This works well for me on Pandas DataFrame
def randomiseMissingData(df2):
"randomise missing data for DataFrame (within a column)"
df = df2.copy()
for col in df.columns:
data = df[col]
mask = data.isnull()
samples = random.choices( data[~mask].values , k = mask.sum() )
data[mask] = samples
return df
Solution 3:
This is another approach to this question after making improvement on the first answer and according to how to check if an numpy int is nand found here in numpy documentation
foo['A'].apply(lambda x: np.random.choice([x for x in range(min(foo['A']),max(foo['A'])]) if (np.isnan(x)) else x)
Solution 4:
I did this for filling NaN values with a random non-NaN value:
import random
df['column'].fillna(random.choice(df['column'][df['column'].notna()]), inplace=True)
Solution 5:
Here is another Pandas DataFrame approach
import numpy as np
deffill_with_random(df2, column):
'''Fill `df2`'s column with name `column` with random data based on non-NaN data from `column`'''
df = df2.copy()
df[column] = df[column].apply(lambda x: np.random.choice(df[column].dropna().values) if np.isnan(x) else x)
return df
Post a Comment for "Filling Missing Data By Random Choosing From Non Missing Values In Pandas Dataframe"