Skip to content Skip to sidebar Skip to footer

Fastest Way To Combine Two Slices From Two Pandas Dataframes In A Loop?

I have a list of person IDs, and for each ID, I want to extract all available information from two different dataframes. In addition, the types of information also have IDs, and I

Solution 1:

I took Parfait's advice and first concatenated both dataframes into one, then a coworker gave me a solution to iterate through the dataframe. The dataframe consisted of ~117M rows with ~246K person IDs. My coworker's solution was to create a dictionary where each key is a person ID, and the value for each key is a list of row indices for that person ID in the dataframe. You then use .iloc to slice the dataframe by referencing the values in the dictionary. Finished running in about one hour.

idx = df1['person'].reset_index().groupby('person')['index'].apply(tuple).to_dict()

for i in range(ranges):
    mrn_slice = df1.iloc[list(idx.values()[i])]

Post a Comment for "Fastest Way To Combine Two Slices From Two Pandas Dataframes In A Loop?"