Add Extra Column As The Cumulative Time Difference
How to add an extra column that is the cumulative value of the time differences for each course? For example, the initial table is: id_A course weight ts_
Solution 1:
You can chain the diff method with cumsum:
# convert ts_A to datetime type
df.ts_A = pd.to_datetime(df.ts_A)
# convert ts_A to seconds, group by id and then use transform to calculate the cumulative difference
df['cum_delta_sec'] = df.ts_A.astype(int).div(10**9).groupby(df.id_A).transform(lambda x: x.diff().fillna(0).cumsum())
df
Solution 2:
Use groupby, transform, and .iloc:
df['ts_A'] = pd.to_datetime(df.ts_A)
df['cum_delta_sec'] = (df.groupby('id_A')['ts_A']
.transform(lambda x: (x - x.iloc[0]).dt.total_seconds()))
Output:
id_A course weight ts_A value cum_delta_sec
0 id1 cotton 3.5 2017-04-27 01:35:30 150.000000 0
1 id1 cotton 3.5 2017-04-27 01:36:00 416.666667 30
2 id1 cotton 3.5 2017-04-27 01:36:30 700.000000 60
3 id1 cotton 3.5 2017-04-27 01:37:00 950.000000 90
4 id2 cotton blue 5.0 2017-04-27 02:35:30 150.000000 0
5 id2 cotton blue 5.0 2017-04-27 02:36:00 450.000000 30
6 id2 cotton blue 5.0 2017-04-27 02:36:30 520.666667 60
7 id2 cotton blue 5.0 2017-04-27 02:37:00 610.000000 90
In the group, subtract current value from the first value and use .dt accessor to convert to seconds.
Solution 3:
import csv
import datetime as dt
with open('path/to/input') as fin, open('path/to/output', 'w') as fout:
infile = csv.DictReader(fin, delimiter='\t')
outfile = csv.DictWriter(fout, delimiter='\t', fieldnames=infile.fieldnames + ['cum_delta_sec'])
cdt = 0
last = None
for row in infile:
if last is None:
last = dt.strptime(row['ts_A'], "%Y-%m-%d %H:%M:%S")
row['cum_delta_sec'] = 0
outfile.writerow(row)
continue
cdt += (last - dt.strptime(row['ts_A'], "%Y-%m-%d %H:%M:%S")).total_seconds()
row['cum_delta_sec'] = cdt
outfile.writerow(row)

Post a Comment for "Add Extra Column As The Cumulative Time Difference"