Assignment with indexing#

The indexing catechism page goes through ways of selecting rows and columns by Pandas indexing.

In each case in that page, we are using indexing in expressions.

We can use the indexing on the left hand side of an assignment statement, to select values we will change.

# Load the Numpy library, rename to "np"
import numpy as np

# Load the Pandas data science library, rename to "pd"
import pandas as pd

# Safe setting for Pandas.  Needs Pandas version >= 1.5.
pd.set_option('mode.copy_on_write', True)
# Read the CSV to a DataFrame.
ratings = pd.read_csv('rate_my_course.csv')
# Change the default numerical row labels to be the discipline name.
named_ratings = ratings.set_index('Discipline')
named_ratings.head()
Number of Professors Clarity Helpfulness Overall Quality Easiness
Discipline
English 23343 3.756147 3.821866 3.791364 3.162754
Mathematics 22394 3.487379 3.641526 3.566867 3.063322
Biology 11774 3.608331 3.701530 3.657641 2.710459
Psychology 11179 3.909520 3.887536 3.900949 3.316210
History 11145 3.788818 3.753642 3.773746 3.053803

Remember simple indexing is direct indexing with:

  • Column labels (DICL).

  • Boolean Series (DIBS).

Assignment by direct indexing with Column Labels#

# DICL - one column label between square brackets _sets_ a Series.
clarity = named_ratings['Clarity']  # DICL on right hand side (RHS)
helpfulness = named_ratings['Helpfulness']  # DICL on RHS
# DICL on the left hand side (LHS).
# _Sets_ new column in DataFrame
named_ratings['combined'] = clarity + helpfulness
named_ratings
Number of Professors Clarity Helpfulness Overall Quality Easiness combined
Discipline
English 23343 3.756147 3.821866 3.791364 3.162754 7.578013
Mathematics 22394 3.487379 3.641526 3.566867 3.063322 7.128906
Biology 11774 3.608331 3.701530 3.657641 2.710459 7.309862
Psychology 11179 3.909520 3.887536 3.900949 3.316210 7.797055
History 11145 3.788818 3.753642 3.773746 3.053803 7.542460
... ... ... ... ... ... ...
Anatomy 117 3.932991 3.974701 3.954188 2.863504 7.907692
Earth Science 110 3.636182 3.671364 3.655091 3.106727 7.307545
Linguistics 110 3.749000 3.834545 3.798182 3.309636 7.583545
Mechanical Engineering 104 3.441923 3.531154 3.489327 2.799135 6.973077
Medicine 102 3.927255 3.934216 3.929118 3.109118 7.861471

75 rows × 6 columns

Assignment by direct indexing with Boolean Series#

# Create a Boolean Series by doing a comparison on a Series.
not_clear = clarity < 3.5
not_clear
Discipline
English                   False
Mathematics                True
Biology                   False
Psychology                False
History                   False
                          ...  
Anatomy                   False
Earth Science             False
Linguistics               False
Mechanical Engineering     True
Medicine                  False
Name: Clarity, Length: 75, dtype: bool
# DIBS selects *rows*, giving a DataFrame with one row per True
# Here we set all the values in the not-clear rows to be missing
# values.
unclear_ratings = named_ratings.copy()  # Make a copy before changing.
unclear_ratings[not_clear] = np.nan
unclear_ratings
Number of Professors Clarity Helpfulness Overall Quality Easiness combined
Discipline
English 23343.0 3.756147 3.821866 3.791364 3.162754 7.578013
Mathematics NaN NaN NaN NaN NaN NaN
Biology 11774.0 3.608331 3.701530 3.657641 2.710459 7.309862
Psychology 11179.0 3.909520 3.887536 3.900949 3.316210 7.797055
History 11145.0 3.788818 3.753642 3.773746 3.053803 7.542460
... ... ... ... ... ... ...
Anatomy 117.0 3.932991 3.974701 3.954188 2.863504 7.907692
Earth Science 110.0 3.636182 3.671364 3.655091 3.106727 7.307545
Linguistics 110.0 3.749000 3.834545 3.798182 3.309636 7.583545
Mechanical Engineering NaN NaN NaN NaN NaN NaN
Medicine 102.0 3.927255 3.934216 3.929118 3.109118 7.861471

75 rows × 6 columns

Setting values with .loc#

# Set values a single row by label.
unclear_plus_english = unclear_ratings.copy()
unclear_plus_english.loc['English'] = np.nan
unclear_plus_english
Number of Professors Clarity Helpfulness Overall Quality Easiness combined
Discipline
English NaN NaN NaN NaN NaN NaN
Mathematics NaN NaN NaN NaN NaN NaN
Biology 11774.0 3.608331 3.701530 3.657641 2.710459 7.309862
Psychology 11179.0 3.909520 3.887536 3.900949 3.316210 7.797055
History 11145.0 3.788818 3.753642 3.773746 3.053803 7.542460
... ... ... ... ... ... ...
Anatomy 117.0 3.932991 3.974701 3.954188 2.863504 7.907692
Earth Science 110.0 3.636182 3.671364 3.655091 3.106727 7.307545
Linguistics 110.0 3.749000 3.834545 3.798182 3.309636 7.583545
Mechanical Engineering NaN NaN NaN NaN NaN NaN
Medicine 102.0 3.927255 3.934216 3.929118 3.109118 7.861471

75 rows × 6 columns

# Set a single value by specifying the row and column label.
unclear_ratings.loc['English', 'Clarity'] = 3
unclear_ratings
Number of Professors Clarity Helpfulness Overall Quality Easiness combined
Discipline
English 23343.0 3.000000 3.821866 3.791364 3.162754 7.578013
Mathematics NaN NaN NaN NaN NaN NaN
Biology 11774.0 3.608331 3.701530 3.657641 2.710459 7.309862
Psychology 11179.0 3.909520 3.887536 3.900949 3.316210 7.797055
History 11145.0 3.788818 3.753642 3.773746 3.053803 7.542460
... ... ... ... ... ... ...
Anatomy 117.0 3.932991 3.974701 3.954188 2.863504 7.907692
Earth Science 110.0 3.636182 3.671364 3.655091 3.106727 7.307545
Linguistics 110.0 3.749000 3.834545 3.798182 3.309636 7.583545
Mechanical Engineering NaN NaN NaN NaN NaN NaN
Medicine 102.0 3.927255 3.934216 3.929118 3.109118 7.861471

75 rows × 6 columns

You can set values with Boolean Series using indirect indexing:

# Select rows where clarity < 3 to 0 using .loc with Boolean Series.
unclear_0 = unclear_ratings.copy()
unclear_0.loc[not_clear] = 0
unclear_0
Number of Professors Clarity Helpfulness Overall Quality Easiness combined
Discipline
English 23343.0 3.000000 3.821866 3.791364 3.162754 7.578013
Mathematics 0.0 0.000000 0.000000 0.000000 0.000000 0.000000
Biology 11774.0 3.608331 3.701530 3.657641 2.710459 7.309862
Psychology 11179.0 3.909520 3.887536 3.900949 3.316210 7.797055
History 11145.0 3.788818 3.753642 3.773746 3.053803 7.542460
... ... ... ... ... ... ...
Anatomy 117.0 3.932991 3.974701 3.954188 2.863504 7.907692
Earth Science 110.0 3.636182 3.671364 3.655091 3.106727 7.307545
Linguistics 110.0 3.749000 3.834545 3.798182 3.309636 7.583545
Mechanical Engineering 0.0 0.000000 0.000000 0.000000 0.000000 0.000000
Medicine 102.0 3.927255 3.934216 3.929118 3.109118 7.861471

75 rows × 6 columns

# Set Easiness column from rows where Clarity < 3
unclear_ratings.loc[not_clear, 'Easiness'] = -100
unclear_ratings
Number of Professors Clarity Helpfulness Overall Quality Easiness combined
Discipline
English 23343.0 3.000000 3.821866 3.791364 3.162754 7.578013
Mathematics NaN NaN NaN NaN -100.000000 NaN
Biology 11774.0 3.608331 3.701530 3.657641 2.710459 7.309862
Psychology 11179.0 3.909520 3.887536 3.900949 3.316210 7.797055
History 11145.0 3.788818 3.753642 3.773746 3.053803 7.542460
... ... ... ... ... ... ...
Anatomy 117.0 3.932991 3.974701 3.954188 2.863504 7.907692
Earth Science 110.0 3.636182 3.671364 3.655091 3.106727 7.307545
Linguistics 110.0 3.749000 3.834545 3.798182 3.309636 7.583545
Mechanical Engineering NaN NaN NaN NaN -100.000000 NaN
Medicine 102.0 3.927255 3.934216 3.929118 3.109118 7.861471

75 rows × 6 columns

Setting values using .iloc#

# Set a single row by position
# The first row (by position) is English.
unclear_ratings.iloc[0] = 99
unclear_ratings
Number of Professors Clarity Helpfulness Overall Quality Easiness combined
Discipline
English 99.0 99.000000 99.000000 99.000000 99.000000 99.000000
Mathematics NaN NaN NaN NaN -100.000000 NaN
Biology 11774.0 3.608331 3.701530 3.657641 2.710459 7.309862
Psychology 11179.0 3.909520 3.887536 3.900949 3.316210 7.797055
History 11145.0 3.788818 3.753642 3.773746 3.053803 7.542460
... ... ... ... ... ... ...
Anatomy 117.0 3.932991 3.974701 3.954188 2.863504 7.907692
Earth Science 110.0 3.636182 3.671364 3.655091 3.106727 7.307545
Linguistics 110.0 3.749000 3.834545 3.798182 3.309636 7.583545
Mechanical Engineering NaN NaN NaN NaN -100.000000 NaN
Medicine 102.0 3.927255 3.934216 3.929118 3.109118 7.861471

75 rows × 6 columns

# Set a single value by specifying the row and column positions.
# Easiness is the last column (position -1).
unclear_ratings.iloc[0, -1] = 11
unclear_ratings
Number of Professors Clarity Helpfulness Overall Quality Easiness combined
Discipline
English 99.0 99.000000 99.000000 99.000000 99.000000 11.000000
Mathematics NaN NaN NaN NaN -100.000000 NaN
Biology 11774.0 3.608331 3.701530 3.657641 2.710459 7.309862
Psychology 11179.0 3.909520 3.887536 3.900949 3.316210 7.797055
History 11145.0 3.788818 3.753642 3.773746 3.053803 7.542460
... ... ... ... ... ... ...
Anatomy 117.0 3.932991 3.974701 3.954188 2.863504 7.907692
Earth Science 110.0 3.636182 3.671364 3.655091 3.106727 7.307545
Linguistics 110.0 3.749000 3.834545 3.798182 3.309636 7.583545
Mechanical Engineering NaN NaN NaN NaN -100.000000 NaN
Medicine 102.0 3.927255 3.934216 3.929118 3.109118 7.861471

75 rows × 6 columns

Use indirect indexing to set values into a Series#

You can do assignment by indexing into Series values. Use indirect indexing to do this.

# Remind us what clarity values are
clarity
Discipline
English                   3.756147
Mathematics               3.487379
Biology                   3.608331
Psychology                3.909520
History                   3.788818
                            ...   
Anatomy                   3.932991
Earth Science             3.636182
Linguistics               3.749000
Mechanical Engineering    3.441923
Medicine                  3.927255
Name: Clarity, Length: 75, dtype: float64
# Assign by label.
clarity.loc['English'] = 3
clarity
Discipline
English                   3.000000
Mathematics               3.487379
Biology                   3.608331
Psychology                3.909520
History                   3.788818
                            ...   
Anatomy                   3.932991
Earth Science             3.636182
Linguistics               3.749000
Mechanical Engineering    3.441923
Medicine                  3.927255
Name: Clarity, Length: 75, dtype: float64
# Set using Boolean Series.
clarity.loc[not_clear] = 0
clarity
Discipline
English                   3.000000
Mathematics               0.000000
Biology                   3.608331
Psychology                3.909520
History                   3.788818
                            ...   
Anatomy                   3.932991
Earth Science             3.636182
Linguistics               3.749000
Mechanical Engineering    0.000000
Medicine                  3.927255
Name: Clarity, Length: 75, dtype: float64
# Set by position.
# English is the first value (at position 0).
clarity.iloc[0] = 11
clarity
Discipline
English                   11.000000
Mathematics                0.000000
Biology                    3.608331
Psychology                 3.909520
History                    3.788818
                            ...    
Anatomy                    3.932991
Earth Science              3.636182
Linguistics                3.749000
Mechanical Engineering     0.000000
Medicine                   3.927255
Name: Clarity, Length: 75, dtype: float64