Dropping Nan Rows, Certain Columns In Specific Excel Files Using Glob/merge
I would like to drop NaN rows in the final file in a for loop loading in excel files, and dropping all company, emails, created duplicated columns from all but the final loaded in
Solution 1:
To prevent multiple Company
, Emails
, Created
, Facilitated Meetings
and Attended Meetings
columns, drop them from the right
DataFrame. To remove rows with all NaN
values, use result.dropna(how='all', axis=0)
:
import pandas as pd
import functools
for f in glob.glob("./gowall-users-export-*.xlsx"):
df = pd.read_excel(f)
all_users_sheets_hosts.append(df)
j = re.search('(\d+)', f)
df.columns = df.columns.str.replace('.*Hosted Meetings.*',
'Hosted Meetings'+' '+ j.group(1))
# Droprowsofall NaNs from the final DataFrame in `all_users_sheets_hosts`
all_users_sheets_hosts[-1] = all_users_sheets_hosts[-1].dropna(how='all', axis=0)
def mergefunc(left, right):
cols = ['Company', 'Emails', 'Created', 'Facilitated Meetings', 'Attended Meetings']
right= right.drop(cols, axis=1)
result= pd.merge(left, right, on=['First Name', 'Last Name'], how='outer')
returnresult
all_users_sheets_hosts = functools.reduce(mergefunc, all_users_sheets_hosts)
Since the Company
et. al. columns will only exist in the left
DataFrame, there will be no proliferation of those columns. Note, however, that if the left
and right
DataFrames have different values in those columns, only the values in the first DataFrame in all_users_sheets_hosts
will be kept.
Alternative, if the left
and right
DataFrames have the same values for the Company
et. al. columns, then another option would be to simple merge on those columns too:
def mergefunc(left, right):
cols = ['First Name', 'Last Name', 'Company', 'Emails', 'Created',
'Facilitated Meetings', 'Attended Meetings']
result= pd.merge(left, right, on=cols, how='outer')
returnresult
all_users_sheets_hosts = functools.reduce(mergefunc, all_users_sheets_hosts)
Post a Comment for "Dropping Nan Rows, Certain Columns In Specific Excel Files Using Glob/merge"