Handling Pandas safely#

A lot of Pandas’ design is for speed and efficiency.

Unfortunately, this sometimes means that is it easy to use Pandas incorrectly, and so get results that you do not expect.

If you have Pandas version 1.5 or later, you can skip this page#

This page discusses the problems that can come up when Pandas keeps links between different DataFrames and Series. As you will see below, this is the issue of Pandas copies and views.

Luckily, as of Pandas version 1.5, there is an option you can enable that will allow you to avoid this rather complicated distinction, and, if you have a Pandas version of 1.5 or greater, we strongly suggest you enable that option, like this:

import pandas as pd
pd.set_option('mode.copy_on_write', True)

You will see that option in all the notebooks from this course, and, if you can, we suggest you set that option whenever you import and use Pandas.

You will see more details about what the option means further down this page, so read on if you are interested.

Avoiding trouble#

The rest of this page has some background on the issue of Pandas copies and views, and an explanation of the problems that can come up for older Pandas, or when you do not enable the mode.copy_on_write option. We explain the mode.copy_on_write option, and give some rules to help you stay out of trouble, if you cannot use mode.copy_on_write.

Background: copies and views#

Consider this DataFrame, which should be familiar. It is a table where the rows are course subjects and the columns include average ratings for all University professors / lecturers teaching that subject. See the dataset page for more detail.

import pandas as pd

Notice that we have not yet enabled the mode.copy_on_write option.

We get the ratings:

all_ratings = pd.read_csv('rate_my_course.csv')

To ease some later exposition, we select the first 10 rows, and set the row labels (index) to be letters rather than numbers:

ratings = all_ratings.iloc[:10]
ratings.index = list('ABCDEFGHIJ')
ratings
Discipline Number of Professors Clarity Helpfulness Overall Quality Easiness
A English 23343 3.756147 3.821866 3.791364 3.162754
B Mathematics 22394 3.487379 3.641526 3.566867 3.063322
C Biology 11774 3.608331 3.701530 3.657641 2.710459
D Psychology 11179 3.909520 3.887536 3.900949 3.316210
E History 11145 3.788818 3.753642 3.773746 3.053803
F Chemistry 7346 3.387174 3.538980 3.465485 2.652054
G Communications 6940 3.867349 3.878602 3.875019 3.379829
H Business 6120 3.640327 3.680503 3.663332 3.172033
I Political Science 5824 3.759018 3.748676 3.756197 3.057758
J Economics 5540 3.382735 3.483617 3.435038 2.910078

Now imagine that we have discovered that the rating for ‘Clarity’ in the first row is incorrect; it should be 4.0.

We get ready to make a new, fixed copy of the DataFrame, to store the modified values. We put the ‘Disciplines’ column into the DataFrame to start with.

fixed_ratings = pd.DataFrame()
fixed_ratings['Discipline'] = ratings['Discipline']

Our next obvious step is to get the ‘Clarity’ column as a Pandas Series, for us to work on.

clarity = ratings['Clarity']
clarity.head()
A    3.756147
B    3.487379
C    3.608331
D    3.909520
E    3.788818
Name: Clarity, dtype: float64

We set the corrected first value:

clarity.loc['A'] = 4
clarity.head()
/tmp/ipykernel_6218/111022653.py:1: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  clarity.loc['A'] = 4
A    4.000000
B    3.487379
C    3.608331
D    3.909520
E    3.788818
Name: Clarity, dtype: float64

Notice the warning. We will come back to that soon.

Notice too that we have changed the value in the clarity Series.

Consider — what happens to the matching value in the original DataFrame?

To answer that question, we need to know what kind of thing our clarity Series was. If you have not enabled mode.copy_on_write, the clarity could be a copy or a view.

If the clarity Series is a view, then it still refers directly to the ‘Clarity’ column in the original data frame ratings. A view is something that points to the same memory. When we have a view, the view is another way of looking at the same data. If we modify the data in the view, that means we also modify the original DataFrame, because the data is the same.

clarity could also be copy of the ‘Clarity’ column. A copy duplicates the values from the original data. Therefore a copy has its own values, and its own memory. Changing the data in the copy will have no effect on the original DataFrame, because the data is different.

Note: if you have enabled mode.copy_on_view, clarity will always (effectively) be a copy, and you will not see the behavior below.

ratings.head()
Discipline Number of Professors Clarity Helpfulness Overall Quality Easiness
A English 23343 4.000000 3.821866 3.791364 3.162754
B Mathematics 22394 3.487379 3.641526 3.566867 3.063322
C Biology 11774 3.608331 3.701530 3.657641 2.710459
D Psychology 11179 3.909520 3.887536 3.900949 3.316210
E History 11145 3.788818 3.753642 3.773746 3.053803

We have found that the clarity Series was a view, because the change we made to clarity also changed the value in the original DataFrame.

This may not be what you expected, so you probably did not mean to change the original data.

There are two basic strategies for dealing with this problem.

New Strategy (Pandas >= 1.5): automatic copies when needed#

This strategy uses a feature that is new in Pandas version 1.5.

The summary is — always put the following line after you import Pandas, and before you execute any code using Pandas:

# Ask Pandas to make a copy under the hood, when needed.
pd.set_option('mode.copy_on_write', True)

After you apply this option, Pandas uses an algorithm to work out when to make a copy. You can think if the option as being equivalent to making everything a copy. For example, consider the problem we had above.

# The current values of the `ratings` DataFrame.
ratings.head()
Discipline Number of Professors Clarity Helpfulness Overall Quality Easiness
A English 23343 4.000000 3.821866 3.791364 3.162754
B Mathematics 22394 3.487379 3.641526 3.566867 3.063322
C Biology 11774 3.608331 3.701530 3.657641 2.710459
D Psychology 11179 3.909520 3.887536 3.900949 3.316210
E History 11145 3.788818 3.753642 3.773746 3.053803
# A column from the DataFrame.
clarity = ratings['Clarity']
clarity.head()
A    4.000000
B    3.487379
C    3.608331
D    3.909520
E    3.788818
Name: Clarity, dtype: float64

As before, we set another corrected first value:

clarity.loc['A'] = 99
clarity.head()
A    99.000000
B     3.487379
C     3.608331
D     3.909520
E     3.788818
Name: Clarity, dtype: float64

We set clarity as we expected. But this time, with the mode.copy_on_write option, we did not change the ratings DataFrame from which we selected the clarity values.

ratings.head()
Discipline Number of Professors Clarity Helpfulness Overall Quality Easiness
A English 23343 4.000000 3.821866 3.791364 3.162754
B Mathematics 22394 3.487379 3.641526 3.566867 3.063322
C Biology 11774 3.608331 3.701530 3.657641 2.710459
D Psychology 11179 3.909520 3.887536 3.900949 3.316210
E History 11145 3.788818 3.753642 3.773746 3.053803

Notice that the first Clarity value in ratings did not change — it is still 4 and not 99.

The value in ratings did not change because you can think of the ratings['Clarity'] expression as always taking a copy not a view [1].

If you have Pandas >= 1.5, we strongly suggest you apply this strategy. And in fact, you will see that all the notebooks in this course that import pandas also have the magic line:

pd.set_option('mode.copy_on_write', True)

“Chained assignment” and copy-on-write#

Remember, we have mode.copy_on_write enabled here.

Consider the following code.

row_A = ratings.loc['A']  # Effectively, a *copy* of row labeled A.
row_A.loc['Clarity'] = 199

Sure enough, you have set the row_A Clarity value:

row_A
Discipline               English
Number of Professors       23343
Clarity                      199
Helpfulness             3.821866
Overall Quality         3.791364
Easiness                3.162754
Name: A, dtype: object

At this stage, with mode.copy_on_write enabled, you would expect the first row of ratings to stay the same, because Pandas effectively copies the first row, before doing the assignment into the copy. And you’d be right to expect that.

ratings.loc['A']
Discipline               English
Number of Professors       23343
Clarity                      4.0
Helpfulness             3.821866
Overall Quality         3.791364
Easiness                3.162754
Name: A, dtype: object

But — you may sometimes fail to think of this copy, and be surprised at the result. For example, consider the following code:

# "Chained assignment".
# Assigning a value to a chain of fetched values.
ratings.loc['A'].loc['Clarity'] = 199
ratings.loc['A']
/tmp/ipykernel_6218/4222313501.py:3: ChainedAssignmentError: A value is trying to be set on a copy of a DataFrame or Series through chained assignment.
When using the Copy-on-Write mode, such chained assignment never works to update the original DataFrame or Series, because the intermediate object on which we are setting values always behaves as a copy.

Try using '.loc[row_indexer, col_indexer] = value' instead, to perform the assignment in a single step.

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  ratings.loc['A'].loc['Clarity'] = 199
Discipline               English
Number of Professors       23343
Clarity                      4.0
Helpfulness             3.821866
Overall Quality         3.791364
Easiness                3.162754
Name: A, dtype: object

Notice that here, ratings.loc['A'] does not change.

This kind of code is sometimes called chained assignment because you are chaining the fetch of the values on the left hand side. First you are fetching ratings.loc['A'] and then, from the result, you fetching the Clarity value. Then you are assigning to this chain of fetched values.

Chained assignment can be confusing, because the first line in the cell above looks as if it is setting the Clarity value for the row labeled ‘A’. But in fact, the code is exactly equivalent to the code cells just above that, and has the same effect. That is ratings.loc['A'] effectively results in a copy, so ratings.loc['A'].loc['Clarity'] = 99 is setting the Clarity value to 99 in the copy, which Python will then immediately discard, because you are not storing the copy anywhere. So, if you are not careful, you may think you are modifying the underlying ratings DataFrame, but you are not, because of the internal copying implied by mode.copy_on_write.

If you do want to set the Clarity value of the row labeled ‘A’, you need to have a left hand side that does not use the chaining that you see above. To do this, specify the row and column in a single left-hand-side expression, like this:

# "Unchained assignment".
# Assigning a value directly to a fetched value, no chain.
ratings.loc['A', 'Clarity'] = 199  # No chain on the left-hand side.
ratings.loc['A']
Discipline               English
Number of Professors       23343
Clarity                    199.0
Helpfulness             3.821866
Overall Quality         3.791364
Easiness                3.162754
Name: A, dtype: object

Old Strategy (for Pandas < 1.5): three simple rules#

But now we return to the older, darker world of Pandas < 1.5, where you cannot enable mode.copy_on_write. What should you do then? In the rest of the page, we suggest and explain three simple rules to stay out of trouble.

As your understanding increases, you may find that you can relax some of these rules, but the problems in this page can trip up experts, so please, be very careful, and only relax these rules when you are very confident you understand the underlying problems. See Gory Pandas for a short walk through some of the complexities.

To make the rest of the notebook be more like older Pandas, we turn off the mode.copy_on_write feature:

# To make Pandas in the rest of this notebook look more like Pandas < 1.5.
pd.set_option('mode.copy_on_write', False)

Old strategy rule 1: copy right.#

We strongly suggest that when you get stuff out of a Pandas DataFrame or Series by indexing, to use as a right-hand-side value, you always force Pandas to take a copy.

We call this rule copy right.

As a reminder indexing is where we fetch data from something using square brackets. Indexing can be: direct, with the square brackets directly following the DataFrame or Series; or indirect, where the square brackets follow the .loc or .iloc attributes of the DataFrame or Series.

For example, we have just used direct indexing (square brackets) to fetch the ‘Clarity’ data out of the ratings DataFrame.

# Indexing to fetch a Series from a DataFrame.
clarity = ratings['Clarity']

We earlier found that, without mode.copy_on_write, clarity is a view onto the ‘Clarity’ data in ratings. This is rarely what we want.

Here we apply the copy right rule:

# Applying the "copy right" rule.
clearer_clarity = ratings['Clarity'].copy()

Notice we apply the .copy() method to the ‘Clarity’ Series, so forcing Pandas to make and return a copy of the data.

Now we have done that, we can modify the result without affecting the original DataFrame, because we are changing the copy, not the original.

# Modify the copy with some crazy value.
clearer_clarity.loc['A'] = 99
clearer_clarity.head()
A    99.000000
B     3.487379
C     3.608331
D     3.909520
E     3.788818
Name: Clarity, dtype: float64

This does not affect the original DataFrame:

ratings.head()
Discipline Number of Professors Clarity Helpfulness Overall Quality Easiness
A English 23343 199.000000 3.821866 3.791364 3.162754
B Mathematics 22394 3.487379 3.641526 3.566867 3.063322
C Biology 11774 3.608331 3.701530 3.657641 2.710459
D Psychology 11179 3.909520 3.887536 3.900949 3.316210
E History 11145 3.788818 3.753642 3.773746 3.053803

A digression: copies, views, confusing, warnings#

It can be very difficult to predict when Pandas indexing will give a copy or a view.

For example, here we use indirect indexing (square brackets following .loc) to select the row of ratings with index label ‘A’. Remember .loc indexing uses the index labels.

row_A = ratings.loc['A']
row_A
Discipline               English
Number of Professors       23343
Clarity                    199.0
Helpfulness             3.821866
Overall Quality         3.791364
Easiness                3.162754
Name: A, dtype: object

We saw earlier that direct indexing to select a column ‘Clarity’ gave us a view, meaning that we could change the values in the DataFrame by changing the Series clarity we got from indexing. In fact this is also true if we use indirect indexing with .loc or .iloc. Check this by trying clarity = ratings.loc[:, 'Clarity'] in the code above.

We have just fetched the row labeled ‘A’ using .loc. Given what we know about fetching a column, it would be reasonable to predict this would give us a view.

Does it give a view? Or a copy?

# Changing the 'Clarity' value of the first row.
row_A.loc['Clarity'] = 5
row_A
/tmp/ipykernel_6218/3087147381.py:2: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  row_A.loc['Clarity'] = 5
Discipline               English
Number of Professors       23343
Clarity                        5
Helpfulness             3.821866
Overall Quality         3.791364
Easiness                3.162754
Name: A, dtype: object

Notice the warning, again.

But - this time - did we change the original DataFrame?

ratings.head()
Discipline Number of Professors Clarity Helpfulness Overall Quality Easiness
A English 23343 199.000000 3.821866 3.791364 3.162754
B Mathematics 22394 3.487379 3.641526 3.566867 3.063322
C Biology 11774 3.608331 3.701530 3.657641 2.710459
D Psychology 11179 3.909520 3.887536 3.900949 3.316210
E History 11145 3.788818 3.753642 3.773746 3.053803

No, we didn’t change the original DataFrame — and we conclude that row_A is a copy.

Our first, correct, response is to follow the copy right rule, and make this copy explicit, so we know exactly what we have:

# The "copy right" rule again.
copied_row_A = ratings.loc['A'].copy()

We no longer have a nasty warning when we modify copied_row_A, because Pandas knows we made a copy, so it does not need to warn us that we may be making a mistake:

# We don't get a warning when we change the copied result.
copied_row_A.loc['Clarity'] = 5
copied_row_A
Discipline               English
Number of Professors       23343
Clarity                        5
Helpfulness             3.821866
Overall Quality         3.791364
Easiness                3.162754
Name: A, dtype: object

Please do worry about these warnings. In fact, in the interests of safety, we come to old strategy rule 2.

Old strategy rule 2: make errors for copy/view warnings#

Pandas has a setting that allows you to change the nasty warning about setting with copies into an error.

If you can’t enable mode.copy_on_write as above, we strongly suggest that you do enable these errors, for all your notebooks, like this:

pd.set_option('mode.chained_assignment', 'raise')

After you have set this option, Pandas will stop if you try to do something like the following:

row_A = ratings.loc['A']   # Copy?  Or view?  Difficult to guess.
# Now this generates an error.
row_A.loc['Clarity'] = 299
---------------------------------------------------------------------------
SettingWithCopyError                      Traceback (most recent call last)
/tmp/ipykernel_6218/1769055592.py in ?()
      1 row_A = ratings.loc['A']   # Copy?  Or view?  Difficult to guess.
      2 # Now this generates an error.
----> 3 row_A.loc['Clarity'] = 299

/opt/hostedtoolcache/Python/3.11.9/x64/lib/python3.11/site-packages/pandas/core/indexing.py in ?(self, key, value)
    907         indexer = self._get_setitem_indexer(key)
    908         self._has_valid_setitem_indexer(key)
    909 
    910         iloc = self if self.name == "iloc" else self.obj.iloc
--> 911         iloc._setitem_with_indexer(indexer, value, self.name)

/opt/hostedtoolcache/Python/3.11.9/x64/lib/python3.11/site-packages/pandas/core/indexing.py in ?(self, indexer, value, name)
   1940         if take_split_path:
   1941             # We have to operate column-wise
   1942             self._setitem_with_indexer_split_path(indexer, value, name)
   1943         else:
-> 1944             self._setitem_single_block(indexer, value, name)

/opt/hostedtoolcache/Python/3.11.9/x64/lib/python3.11/site-packages/pandas/core/indexing.py in ?(self, indexer, value, name)
   2211         if isinstance(value, ABCDataFrame) and name != "iloc":
   2212             value = self._align_frame(indexer, value)._values
   2213 
   2214         # check for chained assignment
-> 2215         self.obj._check_is_chained_assignment_possible()
   2216 
   2217         # actually do the set
   2218         self.obj._mgr = self.obj._mgr.setitem(indexer=indexer, value=value)

/opt/hostedtoolcache/Python/3.11.9/x64/lib/python3.11/site-packages/pandas/core/series.py in ?(self)
   1489             ref = self._get_cacher()
   1490             if ref is not None and ref._is_mixed_type:
   1491                 self._check_setitem_copy(t="referent", force=True)
   1492             return True
-> 1493         return super()._check_is_chained_assignment_possible()

/opt/hostedtoolcache/Python/3.11.9/x64/lib/python3.11/site-packages/pandas/core/generic.py in ?(self)
   4395         single-dtype meaning that the cacher should be updated following
   4396         setting.
   4397         """
   4398         if self._is_copy:
-> 4399             self._check_setitem_copy(t="referent")
   4400         return False

/opt/hostedtoolcache/Python/3.11.9/x64/lib/python3.11/site-packages/pandas/core/generic.py in ?(self, t, force)
   4469                 "indexing.html#returning-a-view-versus-a-copy"
   4470             )
   4471 
   4472         if value == "raise":
-> 4473             raise SettingWithCopyError(t)
   4474         if value == "warn":
   4475             warnings.warn(t, SettingWithCopyWarning, stacklevel=find_stack_level())

SettingWithCopyError: 
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy

At first you will find this advice annoying. Your code will generate confusing errors, and you will be tempted to remove this error option to make the errors go away. Please be patient. You will find that, if you follow the copy right rule carefully, most of these errors go away.

Another digression: copy, views, on the left#

There is more discussion of this subject in the Gory Pandas page.

If you are reading this page from start to finish, you will have already seen our discussion of chained assignment above. Here we repeat ourselves a little for the sake our our less linear readers. Consider this code:

ratings.loc['A'].loc['Clarity'] = 299
/tmp/ipykernel_6218/1657792584.py:1: FutureWarning: ChainedAssignmentError: behaviour will change in pandas 3.0!
You are setting values through chained assignment. Currently this works in certain cases, but when using Copy-on-Write (which will become the default behaviour in pandas 3.0) this will never work to update the original DataFrame or Series, because the intermediate object on which we are setting values will behave as a copy.
A typical example is when you are setting values in a column of a DataFrame, like:

df["col"][row_indexer] = value

Use `df.loc[row_indexer, "col"] = values` instead, to perform the assignment in a single step and ensure this keeps updating the original `df`.

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy

  ratings.loc['A'].loc['Clarity'] = 299
---------------------------------------------------------------------------
SettingWithCopyError                      Traceback (most recent call last)
/tmp/ipykernel_6218/1657792584.py in ?()
----> 1 ratings.loc['A'].loc['Clarity'] = 299

/opt/hostedtoolcache/Python/3.11.9/x64/lib/python3.11/site-packages/pandas/core/indexing.py in ?(self, key, value)
    907         indexer = self._get_setitem_indexer(key)
    908         self._has_valid_setitem_indexer(key)
    909 
    910         iloc = self if self.name == "iloc" else self.obj.iloc
--> 911         iloc._setitem_with_indexer(indexer, value, self.name)

/opt/hostedtoolcache/Python/3.11.9/x64/lib/python3.11/site-packages/pandas/core/indexing.py in ?(self, indexer, value, name)
   1940         if take_split_path:
   1941             # We have to operate column-wise
   1942             self._setitem_with_indexer_split_path(indexer, value, name)
   1943         else:
-> 1944             self._setitem_single_block(indexer, value, name)

/opt/hostedtoolcache/Python/3.11.9/x64/lib/python3.11/site-packages/pandas/core/indexing.py in ?(self, indexer, value, name)
   2211         if isinstance(value, ABCDataFrame) and name != "iloc":
   2212             value = self._align_frame(indexer, value)._values
   2213 
   2214         # check for chained assignment
-> 2215         self.obj._check_is_chained_assignment_possible()
   2216 
   2217         # actually do the set
   2218         self.obj._mgr = self.obj._mgr.setitem(indexer=indexer, value=value)

/opt/hostedtoolcache/Python/3.11.9/x64/lib/python3.11/site-packages/pandas/core/series.py in ?(self)
   1489             ref = self._get_cacher()
   1490             if ref is not None and ref._is_mixed_type:
   1491                 self._check_setitem_copy(t="referent", force=True)
   1492             return True
-> 1493         return super()._check_is_chained_assignment_possible()

/opt/hostedtoolcache/Python/3.11.9/x64/lib/python3.11/site-packages/pandas/core/generic.py in ?(self)
   4395         single-dtype meaning that the cacher should be updated following
   4396         setting.
   4397         """
   4398         if self._is_copy:
-> 4399             self._check_setitem_copy(t="referent")
   4400         return False

/opt/hostedtoolcache/Python/3.11.9/x64/lib/python3.11/site-packages/pandas/core/generic.py in ?(self, t, force)
   4469                 "indexing.html#returning-a-view-versus-a-copy"
   4470             )
   4471 
   4472         if value == "raise":
-> 4473             raise SettingWithCopyError(t)
   4474         if value == "warn":
   4475             warnings.warn(t, SettingWithCopyWarning, stacklevel=find_stack_level())

SettingWithCopyError: 
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy

Because we have set the mode.chained_assignment option to error above, this generates an error — but why?

The reason is the same as the reason for the previous error. The code in the cell directly above is just a short-cut for this exact equivalent.

tmp = ratings.loc['A']
tmp.loc['Clarity'] = 299
---------------------------------------------------------------------------
SettingWithCopyError                      Traceback (most recent call last)
/tmp/ipykernel_6218/2016657039.py in ?()
      1 tmp = ratings.loc['A']
----> 2 tmp.loc['Clarity'] = 299

/opt/hostedtoolcache/Python/3.11.9/x64/lib/python3.11/site-packages/pandas/core/indexing.py in ?(self, key, value)
    907         indexer = self._get_setitem_indexer(key)
    908         self._has_valid_setitem_indexer(key)
    909 
    910         iloc = self if self.name == "iloc" else self.obj.iloc
--> 911         iloc._setitem_with_indexer(indexer, value, self.name)

/opt/hostedtoolcache/Python/3.11.9/x64/lib/python3.11/site-packages/pandas/core/indexing.py in ?(self, indexer, value, name)
   1940         if take_split_path:
   1941             # We have to operate column-wise
   1942             self._setitem_with_indexer_split_path(indexer, value, name)
   1943         else:
-> 1944             self._setitem_single_block(indexer, value, name)

/opt/hostedtoolcache/Python/3.11.9/x64/lib/python3.11/site-packages/pandas/core/indexing.py in ?(self, indexer, value, name)
   2211         if isinstance(value, ABCDataFrame) and name != "iloc":
   2212             value = self._align_frame(indexer, value)._values
   2213 
   2214         # check for chained assignment
-> 2215         self.obj._check_is_chained_assignment_possible()
   2216 
   2217         # actually do the set
   2218         self.obj._mgr = self.obj._mgr.setitem(indexer=indexer, value=value)

/opt/hostedtoolcache/Python/3.11.9/x64/lib/python3.11/site-packages/pandas/core/series.py in ?(self)
   1489             ref = self._get_cacher()
   1490             if ref is not None and ref._is_mixed_type:
   1491                 self._check_setitem_copy(t="referent", force=True)
   1492             return True
-> 1493         return super()._check_is_chained_assignment_possible()

/opt/hostedtoolcache/Python/3.11.9/x64/lib/python3.11/site-packages/pandas/core/generic.py in ?(self)
   4395         single-dtype meaning that the cacher should be updated following
   4396         setting.
   4397         """
   4398         if self._is_copy:
-> 4399             self._check_setitem_copy(t="referent")
   4400         return False

/opt/hostedtoolcache/Python/3.11.9/x64/lib/python3.11/site-packages/pandas/core/generic.py in ?(self, t, force)
   4469                 "indexing.html#returning-a-view-versus-a-copy"
   4470             )
   4471 
   4472         if value == "raise":
-> 4473             raise SettingWithCopyError(t)
   4474         if value == "warn":
   4475             warnings.warn(t, SettingWithCopyWarning, stacklevel=find_stack_level())

SettingWithCopyError: 
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy

Specifically, when Python sees ratings.loc['A'].loc['Clarity'] = 299, it first evaluates ratings.loc['A'] to generate a temporary copy. In the code above, we called this temporary copy tmp. It then tries to set the value into the copy with tmp.loc['Clarity'] = 299. This generates the same error as you saw before.

As you have probably guessed from the option name above, Pandas calls this chained assignment, because you are: first, fetching the stuff you want do the assignment on (ratings.loc['A']) and then doing the assignment .loc['Clarity'] = 299. There are two steps on the left hand side, in a chain, first fetching the data, then assigning.

The problem that Pandas has is that it cannot tell that this chained assignment has happened, so it can’t tell what you mean. Python will ask Pandas to generate ratings.loc['A'] first, which it does, to generate the temporary copy that we can call tmp. Python then asks Pandas to set the value with tmp.loc['Clarity'] = 299. When Pandas gets this second instruction, it has no way of knowing that tmp came from the combined instruction ratings.loc['A'].loc['Clarity'] = 299, and so all it can do is set the value into the copy, as instructed.

This leads us to the last rule.

Old strategy rule 3: loc left#

When you do want to use indexing on the left hand side, to set some values into a DataFrame or Series, try do to this all in one shot, using indirect indexing with .loc or iloc.

For example, you have just seen that this generates an error, and why:

ratings.loc['A'].loc['Clarity'] = 299
/tmp/ipykernel_6218/1657792584.py:1: FutureWarning: ChainedAssignmentError: behaviour will change in pandas 3.0!
You are setting values through chained assignment. Currently this works in certain cases, but when using Copy-on-Write (which will become the default behaviour in pandas 3.0) this will never work to update the original DataFrame or Series, because the intermediate object on which we are setting values will behave as a copy.
A typical example is when you are setting values in a column of a DataFrame, like:

df["col"][row_indexer] = value

Use `df.loc[row_indexer, "col"] = values` instead, to perform the assignment in a single step and ensure this keeps updating the original `df`.

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy

  ratings.loc['A'].loc['Clarity'] = 299
---------------------------------------------------------------------------
SettingWithCopyError                      Traceback (most recent call last)
/tmp/ipykernel_6218/1657792584.py in ?()
----> 1 ratings.loc['A'].loc['Clarity'] = 299

/opt/hostedtoolcache/Python/3.11.9/x64/lib/python3.11/site-packages/pandas/core/indexing.py in ?(self, key, value)
    907         indexer = self._get_setitem_indexer(key)
    908         self._has_valid_setitem_indexer(key)
    909 
    910         iloc = self if self.name == "iloc" else self.obj.iloc
--> 911         iloc._setitem_with_indexer(indexer, value, self.name)

/opt/hostedtoolcache/Python/3.11.9/x64/lib/python3.11/site-packages/pandas/core/indexing.py in ?(self, indexer, value, name)
   1940         if take_split_path:
   1941             # We have to operate column-wise
   1942             self._setitem_with_indexer_split_path(indexer, value, name)
   1943         else:
-> 1944             self._setitem_single_block(indexer, value, name)

/opt/hostedtoolcache/Python/3.11.9/x64/lib/python3.11/site-packages/pandas/core/indexing.py in ?(self, indexer, value, name)
   2211         if isinstance(value, ABCDataFrame) and name != "iloc":
   2212             value = self._align_frame(indexer, value)._values
   2213 
   2214         # check for chained assignment
-> 2215         self.obj._check_is_chained_assignment_possible()
   2216 
   2217         # actually do the set
   2218         self.obj._mgr = self.obj._mgr.setitem(indexer=indexer, value=value)

/opt/hostedtoolcache/Python/3.11.9/x64/lib/python3.11/site-packages/pandas/core/series.py in ?(self)
   1489             ref = self._get_cacher()
   1490             if ref is not None and ref._is_mixed_type:
   1491                 self._check_setitem_copy(t="referent", force=True)
   1492             return True
-> 1493         return super()._check_is_chained_assignment_possible()

/opt/hostedtoolcache/Python/3.11.9/x64/lib/python3.11/site-packages/pandas/core/generic.py in ?(self)
   4395         single-dtype meaning that the cacher should be updated following
   4396         setting.
   4397         """
   4398         if self._is_copy:
-> 4399             self._check_setitem_copy(t="referent")
   4400         return False

/opt/hostedtoolcache/Python/3.11.9/x64/lib/python3.11/site-packages/pandas/core/generic.py in ?(self, t, force)
   4469                 "indexing.html#returning-a-view-versus-a-copy"
   4470             )
   4471 
   4472         if value == "raise":
-> 4473             raise SettingWithCopyError(t)
   4474         if value == "warn":
   4475             warnings.warn(t, SettingWithCopyWarning, stacklevel=find_stack_level())

SettingWithCopyError: 
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy

You can avoid that error by doing all your left-hand-side indexing in one shot, like this:

ratings.loc['A', 'Clarity'] = 299
ratings.loc['A']
Discipline               English
Number of Professors       23343
Clarity                    299.0
Helpfulness             3.821866
Overall Quality         3.791364
Easiness                3.162754
Name: A, dtype: object

Notice there is no error. This is because, in this second case, Pandas gets all the instructions in one go. It can see from this combined instruction that we meant to set the ‘Clarity’ value for the row labeled ‘A’ in the ratings DataFrame, and does just this.

Old strategy summary: keep calm, follow the three rules#

Do not worry if some of this is not immediately clear; it is not easy.

The trick is to remember the three rules:

  • Copy right.

  • Make copy warnings into errors.

  • Use .loc and .iloc for your left-hand-side indexing.