Skip to content Skip to sidebar Skip to footer

Pandas: Replace One Cell's Value From Mutiple Row By One Particular Row Based On Other Columns

my aim: uniqueIdentity beginTime progrNumber 0 2018-02-07-6253554 17:40:29 1 1 2018-02-07-6253554 17:40:29 2 2 2018-02-07-6253554 17:40:29 3 3 2018-0

Solution 1:

As you mention in the comments, the lowest progrNumber will also be the lowest beginTime. This means you can just take the lowest beginTime per uniqueIdentity using groupby and transform.

Note if beginTime is of type string, this will only work if it has consistent formatting. (e.g. '09:40:20' instead of '9:40:20')

df['beginTime']=df.groupby('uniqueIdentity').beginTime.transform('min')uniqueIdentitybeginTimeprogrNumber02018-02-07-625355417:40:29112018-02-07-625355417:40:29222018-02-07-555533317:48:29332018-02-07-555533317:48:29442018-02-07-625355417:40:29352018-02-07-625355417:40:29462018-02-07-555533317:48:29172018-02-07-555533317:48:29282018-02-07-234562218:40:29192018-02-07-234562218:40:293102018-02-07-234562218:40:294

Solution 2:

Using groupby and map

The hypothesis is that beginTime will always be minimal for a minimal progrNumber. This condition is true based on the question's comments.

In this answer, I collect the minimum beginTime of each uniqueIdentityand then map it to the original DataFrame based on uniqueIdentity.

times = df.groupby('uniqueIdentity').beginTime.min()
df['beginTime'] = df.uniqueIdentity.map(times)

Solution 3:

Here's another option using a left join and some renaming

# find rows where progrNumber is 1 
    df_prog1=df[df.progrNumber==1]
    # do a left join on the original df=df.merge(df_prog1,on='uniqueIdentity',how='left',suffixes=('','_y'))
    # keep only the beginTime from the right frame df=df[['uniqueIdentity','beginTime_y','progrNumber']]
    # rename columnsdf=df.rename(columns={'beginTime_y':'beginTime'})
    print(df)

Results in :

uniqueIdentitybeginTimeprogrNumber02018-02-07-625355417:40:29112018-02-07-625355417:40:29222018-02-07-625355417:40:29332018-02-07-625355417:40:29442018-02-07-555533317:48:29152018-02-07-555533317:48:29262018-02-07-555533317:48:29372018-02-07-555533317:48:29482018-02-07-234562218:40:29192018-02-07-234562218:40:292102018-02-07-234562218:40:293112018-02-07-234562218:40:294

if you're not sure which record within a uniqueIdentity will have the minimum time, you can use a groupby instead of selecting where progrNumber==1:

df_prog1=df.groupby('uniqueIdentity')['beginTime'].min().reset_index()

And do the left join as above.

Solution 4:

If the first beginTime for each user will always correspond to the minimum program number for each user, you can do:

d = df.groupby('uniqueIdentity')['beginTime'].first().to_dict()
df['beginTime'] = df['uniqueIdentity'].map(d)

To be more explicit about getting the time where the program number is minimum (regardless of its position), you replace d in the above with:

d = df.groupby('uniqueIdentity').apply(lambda x: x['beginTime'][x['progrNumber'].idxmin()]).to_dict()

These two yield the same result for your example data, but they will differ if there are users where the first beginTime (or minimum beginTime per Hugolmn) does not correspond to the minimum progrNumber for the user

Solution 5:

If we cannot assume that the min progrNumber is also the min beginTime, a more sophisiticated approach is required:

df['beginTime'] = (
     df.groupby('uniqueIdentity', as_index=False, group_keys=False)
       .apply(lambda s: pd.Series(s[s.progrNumber==s.progrNumber.min()]
              .beginTime.item(), index=s.index)
       )
)

df
#    uniqueIdentity beginTime   progrNumber# 0  2018-02-07-6253554 17:40:29    1# 1  2018-02-07-6253554 17:40:29    2# 2  2018-02-07-6253554 17:40:29    3# 3  2018-02-07-6253554 17:40:29    4# 4  2018-02-07-6253554 17:40:29    5# 5  2018-02-07-5555333 17:49:15    2# 6  2018-02-07-5555333 17:49:15    3# 7  2018-02-07-5555333 17:49:15    4# 8  2018-02-07-2345622 18:40:29    1# 9  2018-02-07-2345622 18:40:29    2# 10 2018-02-07-2345622 18:40:29    3# 11 2018-02-07-2345622 18:40:29    4

If you don't want a oneliner, an approach with map would be ideal

mapping  = (
     df.groupby('uniqueIdentity')
       .apply(lambda s: s[s.progrNumber==s.progrNumber.min()].beginTime.iloc[0])
 )

 df['beingTime'] = df.uniqueIdentity.map(mapping)

note: You can replace the iloc[0] by item() if you guarantee that only one value has the min progrNumber

Post a Comment for "Pandas: Replace One Cell's Value From Mutiple Row By One Particular Row Based On Other Columns"