Skip to content Skip to sidebar Skip to footer

What Is The Point Of Views In Pandas If It Is Undefined Whether An Indexing Operation Returns A View Or A Copy?

I have switched from R to pandas. I routinely get SettingWithCopyWarnings, when I do something like df_a = pd.DataFrame({'col1': [1,2,3,4]}) # Filtering step, which may or may

Solution 1:

Great question!

The short answer is: this is a flaw in pandas that's being remedied.

You can find a longer discussion of the nature of the problem here, but the main take-away is that we're now moving to a "copy-on-write" behavior in which any time you slice, you get a new copy, and you never have to think about views. The fix will soon come through this refactoring project. I actually tried to fix it directly (see here), but it just wasn't feasible in the current architecture.

In truth, we'll keep views in the background -- they make pandas SUPER memory efficient and fast when they can be provided -- but we'll end up hiding them from users so, from the user perspective, if you slice, index, or cut a DataFrame, what you get back will effectively be a new copy.

(This is accomplished by creating views when the user is only reading data, but whenever an assignment operation is used, the view will be converted to a copy before the assignment takes place.)

Best guess is the fix will be in within a year -- in the mean time, I'm afraid some .copy() may be necessary, sorry!

Solution 2:

I agree this is a bit funny. My current practice is to look for a "functional" method for whatever I want to do (in my experience these almost always exist with the exception of renaming columns and series). Sometimes it makes the code more elegant, sometimes it makes it worse (I don't like assign with lambda), but at least I don't have to worry about mutability.

So for indexing, instead of using the slice notation, you can use query which will return a copy by default:

In [5]: df_a.query('col1 > 1')
Out[5]:
   col1
122334

I expand on it a little in this blog post.

Edit: As raised in the comments, it looks like I'm wrong about query returning a copy by default, however if you use the assign style, then assign will make a copy before returning your result, and you're all good:

df_b = (df_a.query('col1 > 1')
            .assign(newcol = 2*df_a['col1']))

Post a Comment for "What Is The Point Of Views In Pandas If It Is Undefined Whether An Indexing Operation Returns A View Or A Copy?"