Python Pandas: How To Merge Based On An "or" Condition?
Let's say I have two dataframes, and the column names for both are: table 1 columns: [ShipNumber, TrackNumber, ShipDate, Quantity, Weight] table 2 columns: [ShipNumber, TrackNumber
Solution 1:
Use merge()
and concat()
. Then drop any duplicate cases where both A
and B
match (thanks @Scott Boston for that final step).
df1 = pd.DataFrame({'A':[3,2,1,4], 'B':[7,8,9,5]})
df2 = pd.DataFrame({'A':[1,5,6,4], 'B':[4,1,8,5]})
df1 df2
A B A B
0 3 7 0 1 4
1 2 8 1 5 1
2 1 9 2 6 8
3 4 5 3 4 5
With these data frames we should see:
df1.loc[0]
matchesA
ondf2.loc[0]
df1.loc[1]
matchesB
ondf2.loc[2]
df1.loc[3]
matches bothA
andB
ondf2.loc[3]
We'll use suffixes to keep track of what matched where:
suff_A = ['_on_A_match_1', '_on_A_match_2']
suff_B = ['_on_B_match_1', '_on_B_match_2']
df = pd.concat([df1.merge(df2, on='A', suffixes=suff_A),
df1.merge(df2, on='B', suffixes=suff_B)])
A A_on_B_match_1 A_on_B_match_2 B B_on_A_match_1 B_on_A_match_2
01.0 NaN NaN NaN 9.04.014.0 NaN NaN NaN 5.05.00 NaN 2.06.08.0 NaN NaN
1 NaN 4.04.05.0 NaN NaN
Note that the second and fourth rows are duplicate matches (for both data frames, A = 4
and B = 5
). We need to remove one of those sets.
dups =(df.B_on_A_match_1 == df.B_on_A_match_2)# also could remove A_on_B_match
df.loc[~dups]
A A_on_B_match_1 A_on_B_match_2 B B_on_A_match_1 B_on_A_match_2
01.0NaNNaNNaN9.04.00NaN2.06.08.0NaNNaN1NaN4.04.05.0NaNNaN
Solution 2:
I would suggest this alternate way for doing merge like this. This seems easier for me.
table1["id_to_be_merged"] = table1.apply(
lambda row: row["ShipNumber"] if pd.notnull(row["ShipNumber"]) elserow["TrackNumber"], axis=1)
You can add the same column in table2
as well if needed and then use in left_in
or right_on
based on your requirement.
Post a Comment for "Python Pandas: How To Merge Based On An "or" Condition?"