Skip to content Skip to sidebar Skip to footer

Comparing Two Timeseries Dataframes Based On Some Conditions In Pandas

I have two timeseries dataframes df1 and df2: df1 = pd.DataFrame({'date_1':['10/11/2017 0:00','10/11/2017 03:00','10/11/2017 06:00','10/11/2017 09:00'], 'value_1'

Solution 1:

Input data:

>>>df1value_1date_12017-10-11 00:00:00   5000.02017-10-11 03:00:00   1500.02017-10-11 06:00:00   1200.02017-10-11 09:00:00      NaN>>>df2value_2date_22017-10-11 00:00:00   1500.02017-10-11 00:30:00   2050.02017-10-11 00:50:00      NaN2017-10-11 01:20:00   2400.02017-10-11 01:40:00   2500.0...2017-10-11 08:20:00   2400.02017-10-11 08:50:00   2600.02017-10-11 09:20:00      NaN2017-10-11 09:50:00   8000.02017-10-11 10:20:00   9000.0
  1. Fill NaN value from df2 by linear interpolation between t-1 and t+1:
df2['value_2'] = df2['value_2'].interpolate()
  1. Create an interval from df1 according to your rules:
ii = pd.IntervalIndex.from_tuples(
         list(zip(df1.index - pd.DateOffset(hours=1, minutes=29),
                  df1.index + pd.DateOffset(hours=1, minutes=30)))
     )
  1. Bin values into discrete intervals:
df1['interval'] = pd.cut(df1.index, bins=ii)
df2['interval'] = pd.cut(df2.index, bins=ii)
  1. Merge the two dataframes on interval:
dfx = pd.merge(df2, df1, on='interval', how='left').set_index('interval')
dfx = (dfx['value_2'].lt(2800) & dfx['value_1'].gt(1600)) \
          .astype(int).to_frame('count').set_index(df2.index)
  1. Append index of df1 with as a freq of 90 minutes:
dti = df2.index.append(
          pd.DatetimeIndex(df1.index.to_series().resample('90T').groups.keys())
      ).sort_values().drop_duplicates()
dfx = dfx.reindex(dti).ffill().astype(int)
  1. Compute duration from count and reindex from df2:
dfx['duration'] = dfx.index.to_series().diff(-1).abs() \
                     .fillna(pd.Timedelta(0)).dt.components \
                     .apply(lambda x: f"{x['hours']:02}:{x['minutes']:02}",
                            axis='columns')

dfx.loc[dfx['count'] == 0, 'duration'] = '00:00'
dfx = dfx.reindex(df2.index)

Output result:

>>>dfxcountdurationdate_22017-10-11 00:00:00      100:302017-10-11 00:30:00      100:202017-10-11 00:50:00      100:302017-10-11 01:20:00      100:102017-10-11 01:40:00      000:002017-10-11 02:20:00      000:002017-10-11 02:50:00      000:002017-10-11 03:00:00      000:002017-10-11 03:20:00      000:002017-10-11 03:50:00      000:002017-10-11 04:20:00      000:002017-10-11 04:50:00      000:002017-10-11 05:20:00      000:002017-10-11 05:50:00      000:002017-10-11 06:00:00      000:002017-10-11 06:20:00      000:002017-10-11 06:50:00      000:002017-10-11 07:20:00      000:002017-10-11 07:50:00      100:302017-10-11 08:20:00      100:302017-10-11 08:50:00      100:102017-10-11 09:20:00      000:002017-10-11 09:50:00      000:002017-10-11 10:20:00      000:00

Post a Comment for "Comparing Two Timeseries Dataframes Based On Some Conditions In Pandas"