Skip to content Skip to sidebar Skip to footer

Pandas Left Merge Keeping Data In Right Dataframe On Duplicte Columns

I would like to merge two dataframes, df2 might have more columns and will always be 1 row. I would like the data from the df2 row to overwrite the matching row in df. NOTE: ser an

Solution 1:

Frankenstein Answer

df[['ser','no']].merge(df2,'left').set_axis(df.index).fillna(df)

   ser  no     c     d
0001.0NaN1011.0NaN2021.0NaN3101.0NaN4111.0NaN51288.090.06201.0NaN7211.0NaN8221.0NaN

Explanation

  1. I'm going to merge on the columns ['ser', 'no'] and don't want to specify in the merge call. Also, I don't want goofy duplicate column names like 'c_x' and 'c_y' so I slice only columns that I want in common then merge

     df[['ser', 'no']].merge(df2, 'left')
    
  2. When I merge, I want only rows from the left dataframe. However, merge usually produces a number of rows vastly different from the original dataframes and therefore produces a new index. However, NOTE this is assuming the right dataframe (df2) has NO DUPLICATES with respect ['ser', 'no'] then a 'left'merge should produce the same exact number of rows as the left dataframe (df). But it won't have the same index necessarily. It turns out that in this example it does. But I don't want to take chances. So I use set_axis

    set_axis(df.index)
    
  3. Finally, since the resulting dataframe has the same index and columns as df. I can fill in the missing bits with:

    fillna(df)
    

Solution 2:

Update: What you are looking for is combine_first:

(df2.set_index(['ser','no'])
    .combine_first(df.set_index(['ser','no']))
    .reset_index()
)

You can also try concat, which is more similar to 'outer' merge when the pair ser,no are unique valued.

pd.concat([df2,df]).groupby(['ser','no'], as_index=False).first()

Output:

   ser  no   c     d
0001NaN1011NaN2021NaN3101NaN4111NaN5128890.06201NaN7211NaN8221NaN

Post a Comment for "Pandas Left Merge Keeping Data In Right Dataframe On Duplicte Columns"