Get First Row Of Dataframe In Python Pandas Based On Criteria

November 15, 2024 Post a Comment

Let's say that I have a dataframe like this one import pandas as pd df = pd.DataFrame([[1, 2, 1], [1, 3, 2], [4, 6, 3], [4, 3, 4], [5, 4, 5]], columns=['A', 'B', 'C']) >> df

Solution 1:

This tutorial is a very good one for pandas slicing. Make sure you check it out. Onto some snippets... To slice a dataframe with a condition, you use this format:

>>>df[condition]

This will return a slice of your dataframe which you can index using iloc. Here are your examples:

Get first row where A > 3 (returns row 2)

>>> df[df.A > 3].iloc[0]A4B6
C    3
Name: 2, dtype: int64

If what you actually want is the row number, rather than using iloc, it would be df[df.A > 3].index[0].

Get first row where A > 4 AND B > 3:

>>> df[(df.A > 4) & (df.B > 3)].iloc[0]A5B4
C    5
Name: 4, dtype: int64

Get first row where A > 3 AND (B > 3 OR C > 2) (returns row 2)

>>> df[(df.A > 3) & ((df.B > 3) | (df.C > 2))].iloc[0]A4B6
C    3
Name: 2, dtype: int64

Now, with your last case we can write a function that handles the default case of returning the descending-sorted frame:

>>>defseries_or_default(X, condition, default_col, ascending=False):...    sliced = X[condition]...if sliced.shape[0] == 0:...return X.sort_values(default_col, ascending=ascending).iloc[0]...return sliced.iloc[0]>>>>>>series_or_default(df, df.A > 6, 'A')
A    5
B    4
C    5
Name: 4, dtype: int64

As expected, it returns row 4.

Solution 2:

For existing matches, use query:

df.query(' A > 3' ).head(1)
Out[33]: 
   ABC2463df.query(' A > 4 and B > 3' ).head(1)
Out[34]: 
   ABC4545df.query(' A > 3 and (B > 3 or C > 2)' ).head(1)
Out[35]: 
   ABC2463

Solution 3:

you can take care of the first 3 items with slicing and head:

df[df.A>=4].head(1)
df[(df.A>=4)&(df.B>=3)].head(1)
df[(df.A>=4)&((df.B>=3) * (df.C>=2))].head(1)

The condition in case nothing comes back you can handle with a try or an if...

try:
    output = df[df.A>=6].head(1)
    assertlen(output) == 1
except: 
    output = df.sort_values('A',ascending=False).head(1)

Solution 4:

For the point that 'returns the value as soon as you find the first row/record that meets the requirements and NOT iterating other rows', the following code would work:

defpd_iter_func(df):
    for row in df.itertuples():
        # Define your criteria hereif row.A > 4and row.B > 3:
            return row

It is more efficient than Boolean Indexing when it comes to a large dataframe.

To make the function above more applicable, one can implements lambda functions:

defpd_iter_func(df: DataFrame, criteria: Callable[[NamedTuple], bool]) -> Optional[NamedTuple]:
    for row in df.itertuples():
        if criteria(row):
            return row

pd_iter_func(df, lambda row: row.A > 4and row.B > 3)

As mentioned in the answer to the 'mirror' question, pandas.Series.idxmax would also be a nice choice.

defpd_idxmax_func(df, mask):
    return df.loc[mask.idxmax()]

pd_idxmax_func(df, (df.A > 4) & (df.B > 3))

Python Dummy

Get First Row Of Dataframe In Python Pandas Based On Criteria

Solution 1:

Solution 2:

Solution 3:

Solution 4:

Post a Comment for "Get First Row Of Dataframe In Python Pandas Based On Criteria"