Skip to content Skip to sidebar Skip to footer

Python Pandas - Find Missing Rows, And Then Duplicate Another Row With Modifications

Have a data source where each row is uniquely defined by two columns. However, some rows are missing, which need to be inserted with some information from the dataframe. So with th

Solution 1:

Not sure if this is what you are after - You can use the complete function from pyjanitor to expose the missing combinations; at the moment you have to install the latest development version from github:

# installlatestdevversion
# pipinstallgit+https://github.com/ericmjl/pyjanitor.gitimportjanitordf.complete(["A", "B"])

    ABC0110A1120D2210B3220NaN4310C5320E

Using Pandas' only, we can create unique values for columns 'A' and 'B", build a new MultiIndex, then reindex the dataframe:

new_index = pd.MultiIndex.from_product([df.A.unique(), df.B.unique()], 
                                        names=["A", "B"])
new_index

MultiIndex([(1, 10),
            (1, 20),
            (2, 10),
            (2, 20),
            (3, 10),
            (3, 20)],
           names=['A', 'B'])

Now, set index, reindex and reset index:

df.set_index(["A", "B"]).reindex(new_index).reset_index()

    AB   C
0110A1120  D
2210B3220  NaN
4310  C
5320  E

You can also fill the null value:

 df.set_index(["A", "B"]).reindex(new_index, fill_value=0).reset_index()

The complete function requires that you pass a dictionary (or you could just use fillna instead and not worry about a dictionary):

df.complete(["A", "B"], fill_value={"C": 0}) # or df.complete(["A", "B"]).fillna(0)

    A   B   C
0110  A
1120  D
2210  B
322004310  C
5320  E

Post a Comment for "Python Pandas - Find Missing Rows, And Then Duplicate Another Row With Modifications"