Numpy, arrays, and vectors

Numpy, arrays, and vectors#

import numpy as np
import pandas as pd
pd.set_option('mode.copy_on_write', True)
import matplotlib.pyplot as plt

top_15 = pd.read_csv('data/Duncan_Occupational_Prestige.csv').head(15)
top_15

	name	type	income	education	prestige
0	accountant	prof	62	86	82
1	pilot	prof	72	76	83
2	architect	prof	75	92	90
3	author	prof	55	90	76
4	chemist	prof	64	86	90
5	minister	prof	21	84	87
6	professor	prof	64	93	93
7	dentist	prof	80	100	90
8	reporter	wc	67	87	52
9	engineer	prof	72	86	88
10	undertaker	prof	42	74	57
11	lawyer	prof	76	98	89
12	physician	prof	76	97	97
13	welfare.worker	prof	41	84	59
14	teacher	prof	48	91	73

x = np.array(top_15['income'])
x

array([62, 72, 75, 55, 64, 21, 64, 80, 67, 72, 42, 76, 76, 41, 48])

y = np.array(top_15['prestige'])
y

array([82, 83, 90, 76, 90, 87, 93, 90, 52, 88, 57, 89, 97, 59, 73])

# Plot prestige (y) as a function of income (x).
plt.scatter(x, y)

<matplotlib.collections.PathCollection at 0x7f22897be150>

_images/803ccbcbd9455df0304b3429ff34089697ebcbea126507f3d481e6d2e144fa36.png

Let’s guess a slope and intercept:

plt.scatter(x, y)
# Put 0, 0 on the plot.
x_min, x_max, y_min, y_max = plt.axis()
limits = [0, x_max, 0, y_max]
plt.axis(limits);

_images/2c747c1c4b20afda5e0052313784d29cb35f0d81dbf752f391a6ca995c609acb.png

# Our guesses.
b = 0.7
c = 30

# The fitted values
y_hat = b * x + c

plt.scatter(x, y)
plt.plot(x, y_hat, 'ro')
# Put 0, 0 on the plot.
plt.axis(limits);

_images/93979817552f37e6f9190ac9d26bb7bc02f7d8a1e7a62d6ab9ce5712b5ba2877.png

Remember the notation:

\[ \vec{x} = [x_1, x_2, ... x_n] \]

\[ \vec{y} = [y_1, y_2, ... y_n] \]

The 1D array x is Python’s representation of $\vec{x}$, and y is $\vec{y}$.

We calculate our fitted values $\hat{\vec{y}}$ as

\[ \hat{\vec{y}} = b \vec{x} + c \]

$b$ and $c$ are a single values (scalars).

Notice the notation above. The notation assumes that, when we multiply a vector $\vec{x}$ by a scalar $b$, that has the effect of multiplying each value in $\vec{x}$ by the scalar $b$.

The result of $b \vec{x}$ is another vector (we’ve called it $\hat{\vec{y}}$) that has values $[b x_1, b x_2, ..., b x_n]$.

Not coincidentally, this is also what Numpy understands by mupltiplying the array by the scalar:

bx = b * x
bx

array([43.4, 50.4, 52.5, 38.5, 44.8, 14.7, 44.8, 56. , 46.9, 50.4, 29.4,
       53.2, 53.2, 28.7, 33.6])

The same goes for addition. When we add a scalar $c$ to a vector, that has the effect of adding the value $c$ to each value in the vector. So $b \vec{x} + c$ is $[b x_1 + c, b x_2 + c, ..., b x_n + c]$.

# Adds c to every value of bc
bx + c

array([73.4, 80.4, 82.5, 68.5, 74.8, 44.7, 74.8, 86. , 76.9, 80.4, 59.4,
       83.2, 83.2, 58.7, 63.6])

We find the same parallels between Numpy and mathematical notation for adding and subtracting vectors.

Remember we write the calculation of the errors with:

\[ \vec{e} = \vec{y} - \hat{\vec{y}} \]

That is $\hat{\vec{y}} = [y_1 - \hat{y_1}, y_2 - \hat{y_2}, …, y_n

\hat{y_n}]$

When we subtract (or add) two vectors, the result is the element by element subtraction of the values in the vectors.

This is Numpy’s idea as well. This means we can write the mathematical formulation more or less directly in code:

# Calculate e vector from y and y_hat
e = y - y_hat
e

array([  8.6,   2.6,   7.5,   7.5,  15.2,  42.3,  18.2,   4. , -24.9,
         7.6,  -2.4,   5.8,  13.8,   0.3,   9.4])

Arrays and lists#

You’ve learned that Numpy uses the standard mathematical logic for addition and multiplication of vectors.

You may remember that Python Lists aren’t designed for the same purpose, and they have a different logic for addition and multiplication.

x_as_list = list(x)
x_as_list

[np.int64(62),
 np.int64(72),
 np.int64(75),
 np.int64(55),
 np.int64(64),
 np.int64(21),
 np.int64(64),
 np.int64(80),
 np.int64(67),
 np.int64(72),
 np.int64(42),
 np.int64(76),
 np.int64(76),
 np.int64(41),
 np.int64(48)]

Adding a scalar to a list causes an error:

x_as_list + c

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
Cell In[14], line 1
----> 1 x_as_list + c

TypeError: can only concatenate list (not "int") to list

Multiplying a list by a scalar p repeats the list $p$ times:

x_as_list * 3

[np.int64(62),
 np.int64(72),
 np.int64(75),
 np.int64(55),
 np.int64(64),
 np.int64(21),
 np.int64(64),
 np.int64(80),
 np.int64(67),
 np.int64(72),
 np.int64(42),
 np.int64(76),
 np.int64(76),
 np.int64(41),
 np.int64(48),
 np.int64(62),
 np.int64(72),
 np.int64(75),
 np.int64(55),
 np.int64(64),
 np.int64(21),
 np.int64(64),
 np.int64(80),
 np.int64(67),
 np.int64(72),
 np.int64(42),
 np.int64(76),
 np.int64(76),
 np.int64(41),
 np.int64(48),
 np.int64(62),
 np.int64(72),
 np.int64(75),
 np.int64(55),
 np.int64(64),
 np.int64(21),
 np.int64(64),
 np.int64(80),
 np.int64(67),
 np.int64(72),
 np.int64(42),
 np.int64(76),
 np.int64(76),
 np.int64(41),
 np.int64(48)]

Adding two lists concatenates the lists, giving a single list with the elements of the first list followed by the elements of the second:

y_as_list = list(y)
y_as_list

[np.int64(82),
 np.int64(83),
 np.int64(90),
 np.int64(76),
 np.int64(90),
 np.int64(87),
 np.int64(93),
 np.int64(90),
 np.int64(52),
 np.int64(88),
 np.int64(57),
 np.int64(89),
 np.int64(97),
 np.int64(59),
 np.int64(73)]

both = x_as_list + y_as_list
both

[np.int64(62),
 np.int64(72),
 np.int64(75),
 np.int64(55),
 np.int64(64),
 np.int64(21),
 np.int64(64),
 np.int64(80),
 np.int64(67),
 np.int64(72),
 np.int64(42),
 np.int64(76),
 np.int64(76),
 np.int64(41),
 np.int64(48),
 np.int64(82),
 np.int64(83),
 np.int64(90),
 np.int64(76),
 np.int64(90),
 np.int64(87),
 np.int64(93),
 np.int64(90),
 np.int64(52),
 np.int64(88),
 np.int64(57),
 np.int64(89),
 np.int64(97),
 np.int64(59),
 np.int64(73)]

len(both)

To repeat then — Numpy makes addition / subtraction and multiplication / division work in the same way on arrays, as we expect from mathematics. See vector space for the gory details.

Numpy, arrays, and vectors

Contents

Numpy, arrays, and vectors#

Arrays and lists#