Skip to content Skip to sidebar Skip to footer

Pandas: Rapidly Calculating Sum Of Column With Certain Values

I have a pandas dataframe and I need to calculate the sum of a column of values that fall within a certain window. So for instance, if I have a window of 500, and my initial value

Solution 1:

The trick is to drop down to numpy arrays. Pandas indexing and slicing is slow.

import pandas as pd

df = pd.DataFrame([[1, 10177, 0.5], [1, 10178, 0.2], [1, 20178, 0.1],
                   [2, 10180, 0.3], [1, 10180, 0.4]], columns=['chrom', 'pos', 'AFR'])

chrom = df['chrom'].values
pos = df['pos'].values
afr = df['AFR'].values

def filter_sum(chrom_arr, pos_arr, afr_arr, chrom_val, pos_start, pos_end):
    return sum(k for i, j, k in zip(chrom_arr, pos_arr, afr_arr) \
               if pos_start < j < pos_end and i == chrom_val)

filter_sum(chrom, pos, afr, 1, 10150, 10200)

# 1.1

Post a Comment for "Pandas: Rapidly Calculating Sum Of Column With Certain Values"