Pandas: Efficient Way To Select Rows From A Dataframe Using Multiple Criteria
I am selecting/filtering a DataFrame using multiple criteria (comparsion with variables), like so: results = df1[ (df1.Year == Year) & (df1.headline == text) &
Solution 1:
Your current approach is pretty by-the-book as fair as Pandas syntax goes, in my personal opinion.
One way to optimize, if you really need to do so, is to use the underlying NumPy arrays for generating the boolean masks. Generally speaking, Pandas may come with a bit of additional overhead in how it overloads operators versus NumPy. (With the tradeoff being arguably greater flexibility and intrinsically smooth handling of NaN data.)
price = df1.price.values
promo = df1.promo.values
# Note: this is a view to a slice of df1results = df1.loc[
(df1.Year.values == Year) &
(df1.headline.values == text) &
(price > price1) &
(price < price2) &
(promo > promo1) &
(promo < promo2)
]
Secondly, check that you are already taking advantage of numexpr
, which Pandas is enabled to do:
>>>import pandas as pd>>>pd.get_option('compute.use_numexpr') # use `pd.set_option()` if False
True
Post a Comment for "Pandas: Efficient Way To Select Rows From A Dataframe Using Multiple Criteria"