Assignment with indexing#
The indexing catechism page goes through ways of selecting rows and columns by Pandas indexing.
In each case in that page, we are using indexing in expressions.
We can use the indexing on the left hand side of an assignment statement, to select values we will change.
# Load the Numpy library, rename to "np"
import numpy as np
# Load the Pandas data science library, rename to "pd"
import pandas as pd
# Safe setting for Pandas. Needs Pandas version >= 1.5.
pd.set_option('mode.copy_on_write', True)
# Read the CSV to a DataFrame.
ratings = pd.read_csv('rate_my_course.csv')
# Change the default numerical row labels to be the discipline name.
named_ratings = ratings.set_index('Discipline')
named_ratings.head()
Number of Professors | Clarity | Helpfulness | Overall Quality | Easiness | |
---|---|---|---|---|---|
Discipline | |||||
English | 23343 | 3.756147 | 3.821866 | 3.791364 | 3.162754 |
Mathematics | 22394 | 3.487379 | 3.641526 | 3.566867 | 3.063322 |
Biology | 11774 | 3.608331 | 3.701530 | 3.657641 | 2.710459 |
Psychology | 11179 | 3.909520 | 3.887536 | 3.900949 | 3.316210 |
History | 11145 | 3.788818 | 3.753642 | 3.773746 | 3.053803 |
Remember simple indexing is direct indexing with:
Column labels (DICL).
Boolean Series (DIBS).
Assignment by direct indexing with Column Labels#
# DICL - one column label between square brackets _sets_ a Series.
clarity = named_ratings['Clarity'] # DICL on right hand side (RHS)
helpfulness = named_ratings['Helpfulness'] # DICL on RHS
# DICL on the left hand side (LHS).
# _Sets_ new column in DataFrame
named_ratings['combined'] = clarity + helpfulness
named_ratings
Number of Professors | Clarity | Helpfulness | Overall Quality | Easiness | combined | |
---|---|---|---|---|---|---|
Discipline | ||||||
English | 23343 | 3.756147 | 3.821866 | 3.791364 | 3.162754 | 7.578013 |
Mathematics | 22394 | 3.487379 | 3.641526 | 3.566867 | 3.063322 | 7.128906 |
Biology | 11774 | 3.608331 | 3.701530 | 3.657641 | 2.710459 | 7.309862 |
Psychology | 11179 | 3.909520 | 3.887536 | 3.900949 | 3.316210 | 7.797055 |
History | 11145 | 3.788818 | 3.753642 | 3.773746 | 3.053803 | 7.542460 |
... | ... | ... | ... | ... | ... | ... |
Anatomy | 117 | 3.932991 | 3.974701 | 3.954188 | 2.863504 | 7.907692 |
Earth Science | 110 | 3.636182 | 3.671364 | 3.655091 | 3.106727 | 7.307545 |
Linguistics | 110 | 3.749000 | 3.834545 | 3.798182 | 3.309636 | 7.583545 |
Mechanical Engineering | 104 | 3.441923 | 3.531154 | 3.489327 | 2.799135 | 6.973077 |
Medicine | 102 | 3.927255 | 3.934216 | 3.929118 | 3.109118 | 7.861471 |
75 rows × 6 columns
Assignment by direct indexing with Boolean Series#
# Create a Boolean Series by doing a comparison on a Series.
not_clear = clarity < 3.5
not_clear
Discipline
English False
Mathematics True
Biology False
Psychology False
History False
...
Anatomy False
Earth Science False
Linguistics False
Mechanical Engineering True
Medicine False
Name: Clarity, Length: 75, dtype: bool
# DIBS selects *rows*, giving a DataFrame with one row per True
# Here we set all the values in the not-clear rows to be missing
# values.
unclear_ratings = named_ratings.copy() # Make a copy before changing.
unclear_ratings[not_clear] = np.nan
unclear_ratings
Number of Professors | Clarity | Helpfulness | Overall Quality | Easiness | combined | |
---|---|---|---|---|---|---|
Discipline | ||||||
English | 23343.0 | 3.756147 | 3.821866 | 3.791364 | 3.162754 | 7.578013 |
Mathematics | NaN | NaN | NaN | NaN | NaN | NaN |
Biology | 11774.0 | 3.608331 | 3.701530 | 3.657641 | 2.710459 | 7.309862 |
Psychology | 11179.0 | 3.909520 | 3.887536 | 3.900949 | 3.316210 | 7.797055 |
History | 11145.0 | 3.788818 | 3.753642 | 3.773746 | 3.053803 | 7.542460 |
... | ... | ... | ... | ... | ... | ... |
Anatomy | 117.0 | 3.932991 | 3.974701 | 3.954188 | 2.863504 | 7.907692 |
Earth Science | 110.0 | 3.636182 | 3.671364 | 3.655091 | 3.106727 | 7.307545 |
Linguistics | 110.0 | 3.749000 | 3.834545 | 3.798182 | 3.309636 | 7.583545 |
Mechanical Engineering | NaN | NaN | NaN | NaN | NaN | NaN |
Medicine | 102.0 | 3.927255 | 3.934216 | 3.929118 | 3.109118 | 7.861471 |
75 rows × 6 columns
Setting values with .loc
#
# Set values a single row by label.
unclear_plus_english = unclear_ratings.copy()
unclear_plus_english.loc['English'] = np.nan
unclear_plus_english
Number of Professors | Clarity | Helpfulness | Overall Quality | Easiness | combined | |
---|---|---|---|---|---|---|
Discipline | ||||||
English | NaN | NaN | NaN | NaN | NaN | NaN |
Mathematics | NaN | NaN | NaN | NaN | NaN | NaN |
Biology | 11774.0 | 3.608331 | 3.701530 | 3.657641 | 2.710459 | 7.309862 |
Psychology | 11179.0 | 3.909520 | 3.887536 | 3.900949 | 3.316210 | 7.797055 |
History | 11145.0 | 3.788818 | 3.753642 | 3.773746 | 3.053803 | 7.542460 |
... | ... | ... | ... | ... | ... | ... |
Anatomy | 117.0 | 3.932991 | 3.974701 | 3.954188 | 2.863504 | 7.907692 |
Earth Science | 110.0 | 3.636182 | 3.671364 | 3.655091 | 3.106727 | 7.307545 |
Linguistics | 110.0 | 3.749000 | 3.834545 | 3.798182 | 3.309636 | 7.583545 |
Mechanical Engineering | NaN | NaN | NaN | NaN | NaN | NaN |
Medicine | 102.0 | 3.927255 | 3.934216 | 3.929118 | 3.109118 | 7.861471 |
75 rows × 6 columns
# Set a single value by specifying the row and column label.
unclear_ratings.loc['English', 'Clarity'] = 3
unclear_ratings
Number of Professors | Clarity | Helpfulness | Overall Quality | Easiness | combined | |
---|---|---|---|---|---|---|
Discipline | ||||||
English | 23343.0 | 3.000000 | 3.821866 | 3.791364 | 3.162754 | 7.578013 |
Mathematics | NaN | NaN | NaN | NaN | NaN | NaN |
Biology | 11774.0 | 3.608331 | 3.701530 | 3.657641 | 2.710459 | 7.309862 |
Psychology | 11179.0 | 3.909520 | 3.887536 | 3.900949 | 3.316210 | 7.797055 |
History | 11145.0 | 3.788818 | 3.753642 | 3.773746 | 3.053803 | 7.542460 |
... | ... | ... | ... | ... | ... | ... |
Anatomy | 117.0 | 3.932991 | 3.974701 | 3.954188 | 2.863504 | 7.907692 |
Earth Science | 110.0 | 3.636182 | 3.671364 | 3.655091 | 3.106727 | 7.307545 |
Linguistics | 110.0 | 3.749000 | 3.834545 | 3.798182 | 3.309636 | 7.583545 |
Mechanical Engineering | NaN | NaN | NaN | NaN | NaN | NaN |
Medicine | 102.0 | 3.927255 | 3.934216 | 3.929118 | 3.109118 | 7.861471 |
75 rows × 6 columns
You can set values with Boolean Series using indirect indexing:
# Select rows where clarity < 3 to 0 using .loc with Boolean Series.
unclear_0 = unclear_ratings.copy()
unclear_0.loc[not_clear] = 0
unclear_0
Number of Professors | Clarity | Helpfulness | Overall Quality | Easiness | combined | |
---|---|---|---|---|---|---|
Discipline | ||||||
English | 23343.0 | 3.000000 | 3.821866 | 3.791364 | 3.162754 | 7.578013 |
Mathematics | 0.0 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 |
Biology | 11774.0 | 3.608331 | 3.701530 | 3.657641 | 2.710459 | 7.309862 |
Psychology | 11179.0 | 3.909520 | 3.887536 | 3.900949 | 3.316210 | 7.797055 |
History | 11145.0 | 3.788818 | 3.753642 | 3.773746 | 3.053803 | 7.542460 |
... | ... | ... | ... | ... | ... | ... |
Anatomy | 117.0 | 3.932991 | 3.974701 | 3.954188 | 2.863504 | 7.907692 |
Earth Science | 110.0 | 3.636182 | 3.671364 | 3.655091 | 3.106727 | 7.307545 |
Linguistics | 110.0 | 3.749000 | 3.834545 | 3.798182 | 3.309636 | 7.583545 |
Mechanical Engineering | 0.0 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 |
Medicine | 102.0 | 3.927255 | 3.934216 | 3.929118 | 3.109118 | 7.861471 |
75 rows × 6 columns
# Set Easiness column from rows where Clarity < 3
unclear_ratings.loc[not_clear, 'Easiness'] = -100
unclear_ratings
Number of Professors | Clarity | Helpfulness | Overall Quality | Easiness | combined | |
---|---|---|---|---|---|---|
Discipline | ||||||
English | 23343.0 | 3.000000 | 3.821866 | 3.791364 | 3.162754 | 7.578013 |
Mathematics | NaN | NaN | NaN | NaN | -100.000000 | NaN |
Biology | 11774.0 | 3.608331 | 3.701530 | 3.657641 | 2.710459 | 7.309862 |
Psychology | 11179.0 | 3.909520 | 3.887536 | 3.900949 | 3.316210 | 7.797055 |
History | 11145.0 | 3.788818 | 3.753642 | 3.773746 | 3.053803 | 7.542460 |
... | ... | ... | ... | ... | ... | ... |
Anatomy | 117.0 | 3.932991 | 3.974701 | 3.954188 | 2.863504 | 7.907692 |
Earth Science | 110.0 | 3.636182 | 3.671364 | 3.655091 | 3.106727 | 7.307545 |
Linguistics | 110.0 | 3.749000 | 3.834545 | 3.798182 | 3.309636 | 7.583545 |
Mechanical Engineering | NaN | NaN | NaN | NaN | -100.000000 | NaN |
Medicine | 102.0 | 3.927255 | 3.934216 | 3.929118 | 3.109118 | 7.861471 |
75 rows × 6 columns
Setting values using .iloc
#
# Set a single row by position
# The first row (by position) is English.
unclear_ratings.iloc[0] = 99
unclear_ratings
Number of Professors | Clarity | Helpfulness | Overall Quality | Easiness | combined | |
---|---|---|---|---|---|---|
Discipline | ||||||
English | 99.0 | 99.000000 | 99.000000 | 99.000000 | 99.000000 | 99.000000 |
Mathematics | NaN | NaN | NaN | NaN | -100.000000 | NaN |
Biology | 11774.0 | 3.608331 | 3.701530 | 3.657641 | 2.710459 | 7.309862 |
Psychology | 11179.0 | 3.909520 | 3.887536 | 3.900949 | 3.316210 | 7.797055 |
History | 11145.0 | 3.788818 | 3.753642 | 3.773746 | 3.053803 | 7.542460 |
... | ... | ... | ... | ... | ... | ... |
Anatomy | 117.0 | 3.932991 | 3.974701 | 3.954188 | 2.863504 | 7.907692 |
Earth Science | 110.0 | 3.636182 | 3.671364 | 3.655091 | 3.106727 | 7.307545 |
Linguistics | 110.0 | 3.749000 | 3.834545 | 3.798182 | 3.309636 | 7.583545 |
Mechanical Engineering | NaN | NaN | NaN | NaN | -100.000000 | NaN |
Medicine | 102.0 | 3.927255 | 3.934216 | 3.929118 | 3.109118 | 7.861471 |
75 rows × 6 columns
# Set a single value by specifying the row and column positions.
# Easiness is the last column (position -1).
unclear_ratings.iloc[0, -1] = 11
unclear_ratings
Number of Professors | Clarity | Helpfulness | Overall Quality | Easiness | combined | |
---|---|---|---|---|---|---|
Discipline | ||||||
English | 99.0 | 99.000000 | 99.000000 | 99.000000 | 99.000000 | 11.000000 |
Mathematics | NaN | NaN | NaN | NaN | -100.000000 | NaN |
Biology | 11774.0 | 3.608331 | 3.701530 | 3.657641 | 2.710459 | 7.309862 |
Psychology | 11179.0 | 3.909520 | 3.887536 | 3.900949 | 3.316210 | 7.797055 |
History | 11145.0 | 3.788818 | 3.753642 | 3.773746 | 3.053803 | 7.542460 |
... | ... | ... | ... | ... | ... | ... |
Anatomy | 117.0 | 3.932991 | 3.974701 | 3.954188 | 2.863504 | 7.907692 |
Earth Science | 110.0 | 3.636182 | 3.671364 | 3.655091 | 3.106727 | 7.307545 |
Linguistics | 110.0 | 3.749000 | 3.834545 | 3.798182 | 3.309636 | 7.583545 |
Mechanical Engineering | NaN | NaN | NaN | NaN | -100.000000 | NaN |
Medicine | 102.0 | 3.927255 | 3.934216 | 3.929118 | 3.109118 | 7.861471 |
75 rows × 6 columns
Use indirect indexing to set values into a Series#
You can do assignment by indexing into Series values. Use indirect indexing to do this.
# Remind us what clarity values are
clarity
Discipline
English 3.756147
Mathematics 3.487379
Biology 3.608331
Psychology 3.909520
History 3.788818
...
Anatomy 3.932991
Earth Science 3.636182
Linguistics 3.749000
Mechanical Engineering 3.441923
Medicine 3.927255
Name: Clarity, Length: 75, dtype: float64
# Assign by label.
clarity.loc['English'] = 3
clarity
Discipline
English 3.000000
Mathematics 3.487379
Biology 3.608331
Psychology 3.909520
History 3.788818
...
Anatomy 3.932991
Earth Science 3.636182
Linguistics 3.749000
Mechanical Engineering 3.441923
Medicine 3.927255
Name: Clarity, Length: 75, dtype: float64
# Set using Boolean Series.
clarity.loc[not_clear] = 0
clarity
Discipline
English 3.000000
Mathematics 0.000000
Biology 3.608331
Psychology 3.909520
History 3.788818
...
Anatomy 3.932991
Earth Science 3.636182
Linguistics 3.749000
Mechanical Engineering 0.000000
Medicine 3.927255
Name: Clarity, Length: 75, dtype: float64
# Set by position.
# English is the first value (at position 0).
clarity.iloc[0] = 11
clarity
Discipline
English 11.000000
Mathematics 0.000000
Biology 3.608331
Psychology 3.909520
History 3.788818
...
Anatomy 3.932991
Earth Science 3.636182
Linguistics 3.749000
Mechanical Engineering 0.000000
Medicine 3.927255
Name: Clarity, Length: 75, dtype: float64