Assignment with indexing

Assignment with indexing#

The indexing catechism page goes through ways of selecting rows and columns by Pandas indexing.

In each case in that page, we are using indexing in expressions.

We can use the indexing on the left hand side of an assignment statement, to select values we will change.

# Load the Numpy library, rename to "np"
import numpy as np

# Load the Pandas data science library, rename to "pd"
import pandas as pd

# Safe setting for Pandas.  Needs Pandas version >= 1.5.
pd.set_option('mode.copy_on_write', True)

# Read the CSV to a DataFrame.
ratings = pd.read_csv('rate_my_course.csv')
# Change the default numerical row labels to be the discipline name.
named_ratings = ratings.set_index('Discipline')
named_ratings.head()

	Number of Professors	Clarity	Helpfulness	Overall Quality	Easiness
Discipline
English	23343	3.756147	3.821866	3.791364	3.162754
Mathematics	22394	3.487379	3.641526	3.566867	3.063322
Biology	11774	3.608331	3.701530	3.657641	2.710459
Psychology	11179	3.909520	3.887536	3.900949	3.316210
History	11145	3.788818	3.753642	3.773746	3.053803

Remember simple indexing is direct indexing with:

Column labels (DICL).
Boolean Series (DIBS).

Assignment by direct indexing with Column Labels#

# DICL - one column label between square brackets _sets_ a Series.
clarity = named_ratings['Clarity']  # DICL on right hand side (RHS)
helpfulness = named_ratings['Helpfulness']  # DICL on RHS
# DICL on the left hand side (LHS).
# _Sets_ new column in DataFrame
named_ratings['combined'] = clarity + helpfulness
named_ratings

	Number of Professors	Clarity	Helpfulness	Overall Quality	Easiness	combined
Discipline
English	23343	3.756147	3.821866	3.791364	3.162754	7.578013
Mathematics	22394	3.487379	3.641526	3.566867	3.063322	7.128906
Biology	11774	3.608331	3.701530	3.657641	2.710459	7.309862
Psychology	11179	3.909520	3.887536	3.900949	3.316210	7.797055
History	11145	3.788818	3.753642	3.773746	3.053803	7.542460
...	...	...	...	...	...	...
Anatomy	117	3.932991	3.974701	3.954188	2.863504	7.907692
Earth Science	110	3.636182	3.671364	3.655091	3.106727	7.307545
Linguistics	110	3.749000	3.834545	3.798182	3.309636	7.583545
Mechanical Engineering	104	3.441923	3.531154	3.489327	2.799135	6.973077
Medicine	102	3.927255	3.934216	3.929118	3.109118	7.861471

75 rows × 6 columns

Assignment by direct indexing with Boolean Series#

# Create a Boolean Series by doing a comparison on a Series.
not_clear = clarity < 3.5
not_clear

Discipline
English                   False
Mathematics                True
Biology                   False
Psychology                False
History                   False
                          ...  
Anatomy                   False
Earth Science             False
Linguistics               False
Mechanical Engineering     True
Medicine                  False
Name: Clarity, Length: 75, dtype: bool

# DIBS selects *rows*, giving a DataFrame with one row per True
# Here we set all the values in the not-clear rows to be missing
# values.
unclear_ratings = named_ratings.copy()  # Make a copy before changing.
unclear_ratings[not_clear] = np.nan
unclear_ratings

	Number of Professors	Clarity	Helpfulness	Overall Quality	Easiness	combined
Discipline
English	23343.0	3.756147	3.821866	3.791364	3.162754	7.578013
Mathematics	NaN	NaN	NaN	NaN	NaN	NaN
Biology	11774.0	3.608331	3.701530	3.657641	2.710459	7.309862
Psychology	11179.0	3.909520	3.887536	3.900949	3.316210	7.797055
History	11145.0	3.788818	3.753642	3.773746	3.053803	7.542460
...	...	...	...	...	...	...
Anatomy	117.0	3.932991	3.974701	3.954188	2.863504	7.907692
Earth Science	110.0	3.636182	3.671364	3.655091	3.106727	7.307545
Linguistics	110.0	3.749000	3.834545	3.798182	3.309636	7.583545
Mechanical Engineering	NaN	NaN	NaN	NaN	NaN	NaN
Medicine	102.0	3.927255	3.934216	3.929118	3.109118	7.861471

75 rows × 6 columns

Setting values with `.loc`#

# Set values a single row by label.
unclear_plus_english = unclear_ratings.copy()
unclear_plus_english.loc['English'] = np.nan
unclear_plus_english

	Number of Professors	Clarity	Helpfulness	Overall Quality	Easiness	combined
Discipline
English	NaN	NaN	NaN	NaN	NaN	NaN
Mathematics	NaN	NaN	NaN	NaN	NaN	NaN
Biology	11774.0	3.608331	3.701530	3.657641	2.710459	7.309862
Psychology	11179.0	3.909520	3.887536	3.900949	3.316210	7.797055
History	11145.0	3.788818	3.753642	3.773746	3.053803	7.542460
...	...	...	...	...	...	...
Anatomy	117.0	3.932991	3.974701	3.954188	2.863504	7.907692
Earth Science	110.0	3.636182	3.671364	3.655091	3.106727	7.307545
Linguistics	110.0	3.749000	3.834545	3.798182	3.309636	7.583545
Mechanical Engineering	NaN	NaN	NaN	NaN	NaN	NaN
Medicine	102.0	3.927255	3.934216	3.929118	3.109118	7.861471

75 rows × 6 columns

# Set a single value by specifying the row and column label.
unclear_ratings.loc['English', 'Clarity'] = 3
unclear_ratings

	Number of Professors	Clarity	Helpfulness	Overall Quality	Easiness	combined
Discipline
English	23343.0	3.000000	3.821866	3.791364	3.162754	7.578013
Mathematics	NaN	NaN	NaN	NaN	NaN	NaN
Biology	11774.0	3.608331	3.701530	3.657641	2.710459	7.309862
Psychology	11179.0	3.909520	3.887536	3.900949	3.316210	7.797055
History	11145.0	3.788818	3.753642	3.773746	3.053803	7.542460
...	...	...	...	...	...	...
Anatomy	117.0	3.932991	3.974701	3.954188	2.863504	7.907692
Earth Science	110.0	3.636182	3.671364	3.655091	3.106727	7.307545
Linguistics	110.0	3.749000	3.834545	3.798182	3.309636	7.583545
Mechanical Engineering	NaN	NaN	NaN	NaN	NaN	NaN
Medicine	102.0	3.927255	3.934216	3.929118	3.109118	7.861471

75 rows × 6 columns

You can set values with Boolean Series using indirect indexing:

# Select rows where clarity < 3 to 0 using .loc with Boolean Series.
unclear_0 = unclear_ratings.copy()
unclear_0.loc[not_clear] = 0
unclear_0

	Number of Professors	Clarity	Helpfulness	Overall Quality	Easiness	combined
Discipline
English	23343.0	3.000000	3.821866	3.791364	3.162754	7.578013
Mathematics	0.0	0.000000	0.000000	0.000000	0.000000	0.000000
Biology	11774.0	3.608331	3.701530	3.657641	2.710459	7.309862
Psychology	11179.0	3.909520	3.887536	3.900949	3.316210	7.797055
History	11145.0	3.788818	3.753642	3.773746	3.053803	7.542460
...	...	...	...	...	...	...
Anatomy	117.0	3.932991	3.974701	3.954188	2.863504	7.907692
Earth Science	110.0	3.636182	3.671364	3.655091	3.106727	7.307545
Linguistics	110.0	3.749000	3.834545	3.798182	3.309636	7.583545
Mechanical Engineering	0.0	0.000000	0.000000	0.000000	0.000000	0.000000
Medicine	102.0	3.927255	3.934216	3.929118	3.109118	7.861471

75 rows × 6 columns

# Set Easiness column from rows where Clarity < 3
unclear_ratings.loc[not_clear, 'Easiness'] = -100
unclear_ratings

	Number of Professors	Clarity	Helpfulness	Overall Quality	Easiness	combined
Discipline
English	23343.0	3.000000	3.821866	3.791364	3.162754	7.578013
Mathematics	NaN	NaN	NaN	NaN	-100.000000	NaN
Biology	11774.0	3.608331	3.701530	3.657641	2.710459	7.309862
Psychology	11179.0	3.909520	3.887536	3.900949	3.316210	7.797055
History	11145.0	3.788818	3.753642	3.773746	3.053803	7.542460
...	...	...	...	...	...	...
Anatomy	117.0	3.932991	3.974701	3.954188	2.863504	7.907692
Earth Science	110.0	3.636182	3.671364	3.655091	3.106727	7.307545
Linguistics	110.0	3.749000	3.834545	3.798182	3.309636	7.583545
Mechanical Engineering	NaN	NaN	NaN	NaN	-100.000000	NaN
Medicine	102.0	3.927255	3.934216	3.929118	3.109118	7.861471

75 rows × 6 columns

Setting values using `.iloc`#

# Set a single row by position
# The first row (by position) is English.
unclear_ratings.iloc[0] = 99
unclear_ratings

	Number of Professors	Clarity	Helpfulness	Overall Quality	Easiness	combined
Discipline
English	99.0	99.000000	99.000000	99.000000	99.000000	99.000000
Mathematics	NaN	NaN	NaN	NaN	-100.000000	NaN
Biology	11774.0	3.608331	3.701530	3.657641	2.710459	7.309862
Psychology	11179.0	3.909520	3.887536	3.900949	3.316210	7.797055
History	11145.0	3.788818	3.753642	3.773746	3.053803	7.542460
...	...	...	...	...	...	...
Anatomy	117.0	3.932991	3.974701	3.954188	2.863504	7.907692
Earth Science	110.0	3.636182	3.671364	3.655091	3.106727	7.307545
Linguistics	110.0	3.749000	3.834545	3.798182	3.309636	7.583545
Mechanical Engineering	NaN	NaN	NaN	NaN	-100.000000	NaN
Medicine	102.0	3.927255	3.934216	3.929118	3.109118	7.861471

75 rows × 6 columns

# Set a single value by specifying the row and column positions.
# Easiness is the last column (position -1).
unclear_ratings.iloc[0, -1] = 11
unclear_ratings

	Number of Professors	Clarity	Helpfulness	Overall Quality	Easiness	combined
Discipline
English	99.0	99.000000	99.000000	99.000000	99.000000	11.000000
Mathematics	NaN	NaN	NaN	NaN	-100.000000	NaN
Biology	11774.0	3.608331	3.701530	3.657641	2.710459	7.309862
Psychology	11179.0	3.909520	3.887536	3.900949	3.316210	7.797055
History	11145.0	3.788818	3.753642	3.773746	3.053803	7.542460
...	...	...	...	...	...	...
Anatomy	117.0	3.932991	3.974701	3.954188	2.863504	7.907692
Earth Science	110.0	3.636182	3.671364	3.655091	3.106727	7.307545
Linguistics	110.0	3.749000	3.834545	3.798182	3.309636	7.583545
Mechanical Engineering	NaN	NaN	NaN	NaN	-100.000000	NaN
Medicine	102.0	3.927255	3.934216	3.929118	3.109118	7.861471

75 rows × 6 columns

Use indirect indexing to set values into a Series#

You can do assignment by indexing into Series values. Use indirect indexing to do this.

# Remind us what clarity values are
clarity

Discipline
English                   3.756147
Mathematics               3.487379
Biology                   3.608331
Psychology                3.909520
History                   3.788818
                            ...   
Anatomy                   3.932991
Earth Science             3.636182
Linguistics               3.749000
Mechanical Engineering    3.441923
Medicine                  3.927255
Name: Clarity, Length: 75, dtype: float64

# Assign by label.
clarity.loc['English'] = 3
clarity

Discipline
English                   3.000000
Mathematics               3.487379
Biology                   3.608331
Psychology                3.909520
History                   3.788818
                            ...   
Anatomy                   3.932991
Earth Science             3.636182
Linguistics               3.749000
Mechanical Engineering    3.441923
Medicine                  3.927255
Name: Clarity, Length: 75, dtype: float64

# Set using Boolean Series.
clarity.loc[not_clear] = 0
clarity

Discipline
English                   3.000000
Mathematics               0.000000
Biology                   3.608331
Psychology                3.909520
History                   3.788818
                            ...   
Anatomy                   3.932991
Earth Science             3.636182
Linguistics               3.749000
Mechanical Engineering    0.000000
Medicine                  3.927255
Name: Clarity, Length: 75, dtype: float64

# Set by position.
# English is the first value (at position 0).
clarity.iloc[0] = 11
clarity

Discipline
English                   11.000000
Mathematics                0.000000
Biology                    3.608331
Psychology                 3.909520
History                    3.788818
                            ...    
Anatomy                    3.932991
Earth Science              3.636182
Linguistics                3.749000
Mechanical Engineering     0.000000
Medicine                   3.927255
Name: Clarity, Length: 75, dtype: float64

Assignment with indexing

Contents

Assignment with indexing#

Assignment by direct indexing with Column Labels#

Assignment by direct indexing with Boolean Series#

Setting values with .loc#

Setting values using .iloc#

Use indirect indexing to set values into a Series#

Setting values with `.loc`#

Setting values using `.iloc`#