Pandas: Rapidly Calculating Sum Of Column With Certain Values
I have a pandas dataframe and I need to calculate the sum of a column of values that fall within a certain window. So for instance, if I have a window of 500, and my initial value
Solution 1:
The trick is to drop down to numpy arrays. Pandas indexing and slicing is slow.
import pandas as pd
df = pd.DataFrame([[1, 10177, 0.5], [1, 10178, 0.2], [1, 20178, 0.1],
[2, 10180, 0.3], [1, 10180, 0.4]], columns=['chrom', 'pos', 'AFR'])
chrom = df['chrom'].values
pos = df['pos'].values
afr = df['AFR'].values
def filter_sum(chrom_arr, pos_arr, afr_arr, chrom_val, pos_start, pos_end):
return sum(k for i, j, k in zip(chrom_arr, pos_arr, afr_arr) \
if pos_start < j < pos_end and i == chrom_val)
filter_sum(chrom, pos, afr, 1, 10150, 10200)
# 1.1
Post a Comment for "Pandas: Rapidly Calculating Sum Of Column With Certain Values"