Pandas: Replace One Cell's Value From Mutiple Row By One Particular Row Based On Other Columns
Solution 1:
As you mention in the comments, the lowest progrNumber
will also be the lowest beginTime
. This means you can just take the lowest beginTime
per uniqueIdentity
using groupby
and transform
.
Note if beginTime
is of type string, this will only work if it has consistent formatting. (e.g. '09:40:20' instead of '9:40:20')
df['beginTime']=df.groupby('uniqueIdentity').beginTime.transform('min')uniqueIdentitybeginTimeprogrNumber02018-02-07-625355417:40:29112018-02-07-625355417:40:29222018-02-07-555533317:48:29332018-02-07-555533317:48:29442018-02-07-625355417:40:29352018-02-07-625355417:40:29462018-02-07-555533317:48:29172018-02-07-555533317:48:29282018-02-07-234562218:40:29192018-02-07-234562218:40:293102018-02-07-234562218:40:294
Solution 2:
Using groupby
and map
The hypothesis is that beginTime
will always be minimal for a minimal progrNumber
. This condition is true based on the question's comments.
In this answer, I collect the minimum beginTime of each uniqueIdentity
and then map it to the original DataFrame based on uniqueIdentity
.
times = df.groupby('uniqueIdentity').beginTime.min()
df['beginTime'] = df.uniqueIdentity.map(times)
Solution 3:
Here's another option using a left join and some renaming
# find rows where progrNumber is 1
df_prog1=df[df.progrNumber==1]
# do a left join on the original df=df.merge(df_prog1,on='uniqueIdentity',how='left',suffixes=('','_y'))
# keep only the beginTime from the right frame df=df[['uniqueIdentity','beginTime_y','progrNumber']]
# rename columnsdf=df.rename(columns={'beginTime_y':'beginTime'})
print(df)
Results in :
uniqueIdentitybeginTimeprogrNumber02018-02-07-625355417:40:29112018-02-07-625355417:40:29222018-02-07-625355417:40:29332018-02-07-625355417:40:29442018-02-07-555533317:48:29152018-02-07-555533317:48:29262018-02-07-555533317:48:29372018-02-07-555533317:48:29482018-02-07-234562218:40:29192018-02-07-234562218:40:292102018-02-07-234562218:40:293112018-02-07-234562218:40:294
if you're not sure which record within a uniqueIdentity
will have the minimum time, you can use a groupby
instead of selecting where progrNumber==1
:
df_prog1=df.groupby('uniqueIdentity')['beginTime'].min().reset_index()
And do the left join as above.
Solution 4:
If the first beginTime
for each user will always correspond to the minimum program number for each user, you can do:
d = df.groupby('uniqueIdentity')['beginTime'].first().to_dict()
df['beginTime'] = df['uniqueIdentity'].map(d)
To be more explicit about getting the time where the program number is minimum (regardless of its position), you replace d
in the above with:
d = df.groupby('uniqueIdentity').apply(lambda x: x['beginTime'][x['progrNumber'].idxmin()]).to_dict()
These two yield the same result for your example data, but they will differ if there are users where the first beginTime
(or minimum beginTime
per Hugolmn) does not correspond to the minimum progrNumber
for the user
Solution 5:
If we cannot assume that the min progrNumber
is also the min beginTime
, a more sophisiticated approach is required:
df['beginTime'] = (
df.groupby('uniqueIdentity', as_index=False, group_keys=False)
.apply(lambda s: pd.Series(s[s.progrNumber==s.progrNumber.min()]
.beginTime.item(), index=s.index)
)
)
df
# uniqueIdentity beginTime progrNumber# 0 2018-02-07-6253554 17:40:29 1# 1 2018-02-07-6253554 17:40:29 2# 2 2018-02-07-6253554 17:40:29 3# 3 2018-02-07-6253554 17:40:29 4# 4 2018-02-07-6253554 17:40:29 5# 5 2018-02-07-5555333 17:49:15 2# 6 2018-02-07-5555333 17:49:15 3# 7 2018-02-07-5555333 17:49:15 4# 8 2018-02-07-2345622 18:40:29 1# 9 2018-02-07-2345622 18:40:29 2# 10 2018-02-07-2345622 18:40:29 3# 11 2018-02-07-2345622 18:40:29 4
If you don't want a oneliner, an approach with map
would be ideal
mapping = (
df.groupby('uniqueIdentity')
.apply(lambda s: s[s.progrNumber==s.progrNumber.min()].beginTime.iloc[0])
)
df['beingTime'] = df.uniqueIdentity.map(mapping)
note: You can replace the iloc[0]
by item()
if you guarantee that only one value has the min progrNumber
Post a Comment for "Pandas: Replace One Cell's Value From Mutiple Row By One Particular Row Based On Other Columns"