Skip to content Skip to sidebar Skip to footer

Dropping Nan Rows, Certain Columns In Specific Excel Files Using Glob/merge

I would like to drop NaN rows in the final file in a for loop loading in excel files, and dropping all company, emails, created duplicated columns from all but the final loaded in

Solution 1:

To prevent multiple Company, Emails, Created, Facilitated Meetings and Attended Meetings columns, drop them from the right DataFrame. To remove rows with all NaN values, use result.dropna(how='all', axis=0):

import pandas as pd
import functools

for f in glob.glob("./gowall-users-export-*.xlsx"):
    df = pd.read_excel(f)
    all_users_sheets_hosts.append(df)
    j = re.search('(\d+)', f)
    df.columns = df.columns.str.replace('.*Hosted Meetings.*', 
                                        'Hosted Meetings'+' '+ j.group(1))

# Droprowsofall NaNs from the final DataFrame in `all_users_sheets_hosts`
all_users_sheets_hosts[-1] = all_users_sheets_hosts[-1].dropna(how='all', axis=0)

def mergefunc(left, right):
    cols = ['Company', 'Emails', 'Created', 'Facilitated Meetings', 'Attended Meetings']
    right= right.drop(cols, axis=1)
    result= pd.merge(left, right, on=['First Name', 'Last Name'], how='outer')
    returnresult

all_users_sheets_hosts = functools.reduce(mergefunc, all_users_sheets_hosts)

Since the Company et. al. columns will only exist in the left DataFrame, there will be no proliferation of those columns. Note, however, that if the left and right DataFrames have different values in those columns, only the values in the first DataFrame in all_users_sheets_hosts will be kept.


Alternative, if the left and right DataFrames have the same values for the Company et. al. columns, then another option would be to simple merge on those columns too:

def mergefunc(left, right):
    cols = ['First Name', 'Last Name', 'Company', 'Emails', 'Created', 
            'Facilitated Meetings', 'Attended Meetings']
    result= pd.merge(left, right, on=cols, how='outer')
    returnresult
all_users_sheets_hosts = functools.reduce(mergefunc, all_users_sheets_hosts)

Post a Comment for "Dropping Nan Rows, Certain Columns In Specific Excel Files Using Glob/merge"