Skip to content Skip to sidebar Skip to footer

Pandas Rolling Window & Datetime Indexes: What Does `offset` Mean?

The rolling window function pandas.DataFrame.rolling of pandas 0.22 takes a window argument that is described as follows: window : int, or offset Size of the moving window. This i

Solution 1:

In a nutshell, if you use an offset like "2D" (2 days), pandas will use the datetime info in the index (if available), potentially accounting for any missing rows or irregular frequencies. But if you use a simple int like 2, then pandas will treat the index as a simple integer index [0,1,2,...] and ignore any datetime info in the index.

A simple example should make this clear:

df=pd.DataFrame({'x':range(4)}, 
    index=pd.to_datetime(['1-1-2018','1-2-2018','1-4-2018','1-5-2018']))

            x
2018-01-01  0
2018-01-02  1
2018-01-04  2
2018-01-05  3

Note that (1) the index is a datetime, but also (2) it is missing '2018-01-03'. So if you use a plain integer like 2, rolling will just look at the last two rows, regardless of the datetime value (in a sense it's behaving like iloc[i-1:i] where i is the current row):

df.rolling(2).count()x2018-01-01  1.02018-01-02  2.02018-01-04  2.02018-01-05  2.0

Conversely, if you use an offset of 2 days ('2D'), rolling will use the actual datetime values and accounts for any irregularities in the datetime index.

df.rolling('2D').count()x2018-01-01  1.02018-01-02  2.02018-01-04  1.02018-01-05  2.0

Also note, you need the index to be sorted in ascending order when using a date offset, but it doesn't matter when using a simple integer (since you're just ignoring the index anyway).

Post a Comment for "Pandas Rolling Window & Datetime Indexes: What Does `offset` Mean?"