Python Pandas - Find Missing Rows, And Then Duplicate Another Row With Modifications
Have a data source where each row is uniquely defined by two columns. However, some rows are missing, which need to be inserted with some information from the dataframe. So with th
Solution 1:
Not sure if this is what you are after - You can use the complete function from pyjanitor to expose the missing combinations; at the moment you have to install the latest development version from github:
# installlatestdevversion
# pipinstallgit+https://github.com/ericmjl/pyjanitor.gitimportjanitordf.complete(["A", "B"])
ABC0110A1120D2210B3220NaN4310C5320E
Using Pandas' only, we can create unique values for columns 'A' and 'B", build a new MultiIndex, then reindex the dataframe:
new_index = pd.MultiIndex.from_product([df.A.unique(), df.B.unique()],
names=["A", "B"])
new_index
MultiIndex([(1, 10),
(1, 20),
(2, 10),
(2, 20),
(3, 10),
(3, 20)],
names=['A', 'B'])
Now, set index, reindex and reset index:
df.set_index(["A", "B"]).reindex(new_index).reset_index()
AB C
0110A1120 D
2210B3220 NaN
4310 C
5320 E
You can also fill the null value:
df.set_index(["A", "B"]).reindex(new_index, fill_value=0).reset_index()
The complete function requires that you pass a dictionary (or you could just use fillna instead and not worry about a dictionary):
df.complete(["A", "B"], fill_value={"C": 0}) # or df.complete(["A", "B"]).fillna(0)
A B C
0110 A
1120 D
2210 B
322004310 C
5320 E
Post a Comment for "Python Pandas - Find Missing Rows, And Then Duplicate Another Row With Modifications"