Comparing Two Timeseries Dataframes Based On Some Conditions In Pandas
I have two timeseries dataframes df1 and df2: df1 = pd.DataFrame({'date_1':['10/11/2017 0:00','10/11/2017 03:00','10/11/2017 06:00','10/11/2017 09:00'], 'value_1'
Solution 1:
Input data:
>>>df1value_1date_12017-10-11 00:00:00 5000.02017-10-11 03:00:00 1500.02017-10-11 06:00:00 1200.02017-10-11 09:00:00 NaN>>>df2value_2date_22017-10-11 00:00:00 1500.02017-10-11 00:30:00 2050.02017-10-11 00:50:00 NaN2017-10-11 01:20:00 2400.02017-10-11 01:40:00 2500.0...2017-10-11 08:20:00 2400.02017-10-11 08:50:00 2600.02017-10-11 09:20:00 NaN2017-10-11 09:50:00 8000.02017-10-11 10:20:00 9000.0
- Fill
NaN
value from df2 by linear interpolation betweent-1
andt+1
:
df2['value_2'] = df2['value_2'].interpolate()
- Create an interval from df1 according to your rules:
ii = pd.IntervalIndex.from_tuples(
list(zip(df1.index - pd.DateOffset(hours=1, minutes=29),
df1.index + pd.DateOffset(hours=1, minutes=30)))
)
- Bin values into discrete intervals:
df1['interval'] = pd.cut(df1.index, bins=ii)
df2['interval'] = pd.cut(df2.index, bins=ii)
- Merge the two dataframes on
interval
:
dfx = pd.merge(df2, df1, on='interval', how='left').set_index('interval')
dfx = (dfx['value_2'].lt(2800) & dfx['value_1'].gt(1600)) \
.astype(int).to_frame('count').set_index(df2.index)
- Append index of
df1
with as a freq of 90 minutes:
dti = df2.index.append(
pd.DatetimeIndex(df1.index.to_series().resample('90T').groups.keys())
).sort_values().drop_duplicates()
dfx = dfx.reindex(dti).ffill().astype(int)
- Compute duration from
count
and reindex fromdf2
:
dfx['duration'] = dfx.index.to_series().diff(-1).abs() \
.fillna(pd.Timedelta(0)).dt.components \
.apply(lambda x: f"{x['hours']:02}:{x['minutes']:02}",
axis='columns')
dfx.loc[dfx['count'] == 0, 'duration'] = '00:00'
dfx = dfx.reindex(df2.index)
Output result:
>>>dfxcountdurationdate_22017-10-11 00:00:00 100:302017-10-11 00:30:00 100:202017-10-11 00:50:00 100:302017-10-11 01:20:00 100:102017-10-11 01:40:00 000:002017-10-11 02:20:00 000:002017-10-11 02:50:00 000:002017-10-11 03:00:00 000:002017-10-11 03:20:00 000:002017-10-11 03:50:00 000:002017-10-11 04:20:00 000:002017-10-11 04:50:00 000:002017-10-11 05:20:00 000:002017-10-11 05:50:00 000:002017-10-11 06:00:00 000:002017-10-11 06:20:00 000:002017-10-11 06:50:00 000:002017-10-11 07:20:00 000:002017-10-11 07:50:00 100:302017-10-11 08:20:00 100:302017-10-11 08:50:00 100:102017-10-11 09:20:00 000:002017-10-11 09:50:00 000:002017-10-11 10:20:00 000:00
Post a Comment for "Comparing Two Timeseries Dataframes Based On Some Conditions In Pandas"