Using minimize#

We have already used the minimize function a few times:

from scipy.optimize import minimize

Now we pause to look at how it works, and how best to use it.

Let’s go back to the problem from finding lines:

# Our usual imports and configuration.
import numpy as np
import pandas as pd
pd.set_option('mode.copy_on_write', True)
import matplotlib.pyplot as plt
# Make plots look a little bit more fancy
plt.style.use('fivethirtyeight')

We used the students ratings dataset dataset. Download the data file via rate_my_course.csv.

We were looking for the best slope to relate the Easiness ratings to the Overall Quality ratings.

# Read the data file, get columns as arrays.
ratings = pd.read_csv('rate_my_course.csv')
easiness = np.array(ratings['Easiness'])
quality = np.array(ratings['Overall Quality'])
plt.plot(easiness, quality, 'o')
plt.xlabel('Easiness')
plt.ylabel('Overall quality')
Text(0, 0.5, 'Overall quality')
../_images/f27451ab44e70a68aeaa6900f6314f6dfbc61c6911b63442a511ce74cdfe933a.png

Here is the function we used to calculate root means squared error (RMSE), adapted for minimize.

def calc_rmse_for_minimize(c_s):
    # c_s has two elements, the intercept c and the slope s.
    c = c_s[0]
    s = c_s[1]
    predicted_quality = c + easiness * s
    errors = quality - predicted_quality
    return np.sqrt(np.mean(errors ** 2))

We called minimize to find the best intercept and slope:

min_res = minimize(calc_rmse_for_minimize, [2.25, 0.47])
min_res
  message: Optimization terminated successfully.
  success: True
   status: 0
      fun: 0.11930057876222537
        x: [ 2.115e+00  5.089e-01]
      nit: 4
      jac: [-2.442e-06  1.345e-06]
 hess_inv: [[ 1.773e+01 -5.527e+00]
            [-5.527e+00  1.735e+00]]
     nfev: 24
     njev: 8

The result of minimize#

Notice the thing that minimize returns is a special kind of thing that stores information about the result:

# The value that comes back is a special thing to contain minimize results:
type(min_res)
scipy.optimize._optimize.OptimizeResult

Among the interesting thing that the results contain is the attribute fun. This value is the final minimal value of the function we are trying to minimize — here the RMSE given an intercept and slope:

# The minimal value that `minimize` could find for our function.
min_res.fun
0.11930057876222537

We also have an array with the values for the intercept and slope that give the minimal value:

min_res.x
array([2.11475042, 0.50887914])

We confirm that min_res.fun is indeed the value we get from our function given the intercept, slope array in min_res.x:

calc_rmse_for_minimize(min_res.x)
0.11930057876222537

The function to minimize#

calc_rmse_for_minimize is a function:

type(calc_rmse_for_minimize)
function

We pass the function to minimize as an argument for it to use.

We can call the function we pass to minimize — the objective function — in the sense that our objective is to minimize this function.

How is minimize using our objective function? Clearly minimize is calling the function, but what arguments is it sending? How many times does it call the function?

To find out, we can stick something inside the objective function to tell us each time it is called:

def rmse_func_for_info(c_s):
    # Print the argument that minimize sent.
    print('Called with', c_s, 'of type', type(c_s))
    # The rest of the function is the same as the original above.
    # c_s has two elements, the intercept c and the slope s.
    c = c_s[0]
    s = c_s[1]
    predicted_quality = c + easiness * s
    errors = quality - predicted_quality
    return np.sqrt(np.mean(errors ** 2))

The we call minimize with the new objective function:

min_res = minimize(rmse_func_for_info, [2.25, 0.47])
min_res
Called with [2.25 0.47] of type <class 'numpy.ndarray'>
Called with [2.25000001 0.47      ] of type <class 'numpy.ndarray'>
Called with [2.25       0.47000001] of type <class 'numpy.ndarray'>
Called with [2.15529163 0.19056832] of type <class 'numpy.ndarray'>
Called with [2.15529164 0.19056832] of type <class 'numpy.ndarray'>
Called with [2.15529163 0.19056833] of type <class 'numpy.ndarray'>
Called with [2.24765108 0.46306966] of type <class 'numpy.ndarray'>
Called with [2.2476511  0.46306966] of type <class 'numpy.ndarray'>
Called with [2.24765108 0.46306967] of type <class 'numpy.ndarray'>
Called with [2.24898418 0.46700288] of type <class 'numpy.ndarray'>
Called with [2.24898419 0.46700288] of type <class 'numpy.ndarray'>
Called with [2.24898418 0.46700289] of type <class 'numpy.ndarray'>
Called with [2.24209018 0.46915035] of type <class 'numpy.ndarray'>
Called with [2.24209019 0.46915035] of type <class 'numpy.ndarray'>
Called with [2.24209018 0.46915036] of type <class 'numpy.ndarray'>
Called with [2.21451417 0.47774023] of type <class 'numpy.ndarray'>
Called with [2.21451418 0.47774023] of type <class 'numpy.ndarray'>
Called with [2.21451417 0.47774025] of type <class 'numpy.ndarray'>
Called with [2.13294898 0.50319907] of type <class 'numpy.ndarray'>
Called with [2.132949   0.50319907] of type <class 'numpy.ndarray'>
Called with [2.13294898 0.50319909] of type <class 'numpy.ndarray'>
Called with [2.11475042 0.50887914] of type <class 'numpy.ndarray'>
Called with [2.11475043 0.50887914] of type <class 'numpy.ndarray'>
Called with [2.11475042 0.50887916] of type <class 'numpy.ndarray'>
  message: Optimization terminated successfully.
  success: True
   status: 0
      fun: 0.11930057876222537
        x: [ 2.115e+00  5.089e-01]
      nit: 4
      jac: [-2.442e-06  1.345e-06]
 hess_inv: [[ 1.773e+01 -5.527e+00]
            [-5.527e+00  1.735e+00]]
     nfev: 24
     njev: 8

This shows that:

  • minimize calls our function multiple times, as it searches for the values of intercept and slope giving the minimum RMSE.

  • At each call, it passes a single argument that is an array containing the two values (intercept and slope).

Looking carefully, we see signs that minimize is trying small changes in the slope or intercept, presumably to calculate the gradient — as we saw in the optimization page — but that is not our concern here.

The values in the array that minimize passes are the parameters that minimize is trying to optimize — in our case, the intercept and slope. Call this the parameter array.

Tricks for using minimize#

Unpacking#

Near the top of our function, we have these two lines of code:

    c = c_s[0]
    s = c_s[1]

This is where we take set the intercept from the first value of the parameter array, and the slope from the second.

It turns out there is neat and versatile way of doing this in Python, called unpacking. Consider this array:

c_s = np.array([2, 1])

We can unpack these two values into variables like this:

# Unpacking!
c, s = c_s

The right hand side contains two values (in an array). The left hand side has two variable names, separated by a comma. Python take the two values from the right hand side, and puts them into the variables on the left:

print('c is', c)
print('s_is', s)
c is 2
s_is 1

The thing on the right hand side can be anything that is a sequence of two things. For example, it can also be a list:

my_list = [6, 7]
my_var1, my_var2 = my_list
print('my_var1 is', my_var1)
print('my_var2 is', my_var2)
my_var1 is 6
my_var2 is 7

There can be three or four or any other number of variables on the left, as long as there is a matching number of elements in the thing on the right:

list2 = [10, 100, 10000]
w, x, y = list2
print('w is', w)
print('x is', x)
print('y is', y)
w is 10
x is 100
y is 10000

The number of elements must match the number of variables:

# Error - three variables on the left, two elements on the right.
p, q, r = [1, 2]
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
Cell In[18], line 2
      1 # Error - three variables on the left, two elements on the right.
----> 2 p, q, r = [1, 2]

ValueError: not enough values to unpack (expected 3, got 2)

Using unpacking in minimize#

Unpacking has two good uses with minimize. The first is that we can make our function to minimize a little neater:

def rmse_with_unpacking(c_s):
    # c_s has two elements, the intercept c and the slope s.
    # Use unpacking!
    c, s = c_s
    predicted_quality = c + easiness * s
    errors = quality - predicted_quality
    return np.sqrt(np.mean(errors ** 2))

The second is, we can use unpacking to — er — unpack the best fit intercept and slope from the result of minimize. Remember, minimize returns a result value, that includes an array x with the parameters minimizing our function:

m_r = minimize(rmse_with_unpacking, [2.25, 0.47])
m_r
  message: Optimization terminated successfully.
  success: True
   status: 0
      fun: 0.11930057876222537
        x: [ 2.115e+00  5.089e-01]
      nit: 4
      jac: [-2.442e-06  1.345e-06]
 hess_inv: [[ 1.773e+01 -5.527e+00]
            [-5.527e+00  1.735e+00]]
     nfev: 24
     njev: 8

We can get the best-fit intercept and slope values by unpacking:

best_c, best_s = m_r.x
print(best_c)
print(best_s)
2.114750415469181
0.5088791428878439

Other arguments to our function#

At the moment, using minimize is a little inconvenient, because we have to make a separate cost (objective) function for each set of x and y values.

Here is our new, neater cost function to find the best slope and intercept for easiness and quality:

def cost_easy_quality(c_s):
    # c_s has two elements, the intercept c and the slope s.
    c, s = c_s
    predicted_quality = c + easiness * s
    errors = quality - predicted_quality
    return np.sqrt(np.mean(errors ** 2))

We find the intercept and slope — and get the usual values.

# Show the best intercept and slope for easiness and quality.
minimize(cost_easy_quality, [2.25, 0.47]).x
array([2.11475042, 0.50887914])

Now let’s imagine we are interested in the relationship of easiness and clarity:

clarity = np.array(ratings['Overall Quality'])
clarity[:10]
array([3.79136358, 3.56686702, 3.65764056, 3.9009491 , 3.77374607,
       3.46548462, 3.87501873, 3.6633317 , 3.75619677, 3.43503791])

Notice that our cost_easy_quality function uses the top-level notebook variables easiness and quality within the function. It can only work on these variables, and no others. So, if we want to get the slope and intercept for easiness and clarity, instead of easiness and quality, we have to write another almost-identical function, like this:

def cost_easy_clarity(c_s):
    # c_s has two elements, the intercept c and the slope s.
    c, s = c_s
    predicted_clarity = c + easiness * s
    errors = clarity - predicted_clarity
    return np.sqrt(np.mean(errors ** 2))
# Show the best intercept and slope for easiness and clarity.
minimize(cost_easy_clarity, [2.25, 0.47]).x
array([2.11475042, 0.50887914])

How can we avoid writing such near-identical functions for each set of x and y values for which we want a best-fit line?

Generalizing the cost-function with the args argument#

Inspect the help for minimize with minimize?. You will notice that there is an argument we can pass to minimize called args. These are “Extra arguments passed to the objective function”. In fact, this is very useful to make our objective function more general. First we try this silly use of args where we will pass a couple of useless bits of text to our objective function:

def rmse_with_extra_args(c_s, v1, v2):
    # An objective function with some useless extra arguments.
    print('c_s is', c_s, '; v1 is', v1, '; v2 is', v2)
    c, s = c_s
    predicted_quality = c + easiness * s
    errors = quality - predicted_quality
    return np.sqrt(np.mean(errors ** 2))

Now we tell minimize to pass some value for v1 and v2 for every call to the objective function. We do that by passing v1 and v2 inside a tuple. A tuple is a data type in Python that is much like a list, in that it contains sequences of values, we create tuples using parentheses () rather than square brackets.

Here is our argument tuple:

v1 = 'Doctor'
v2 = 'Strange'
# Make the argument tuple. Notice the parentheses.
extras = (v1, v2)
extras
('Doctor', 'Strange')

Here we pass the argument tuple to minimize, to ask minimize to pass these extra arguments to the objective function, each time it calls the function:

# args has two values, one that becomes "v1" and the other that becomes "v2".
minimize(rmse_with_extra_args, [2.25, 0.47], args=extras)
c_s is [2.25 0.47] ; v1 is Doctor ; v2 is Strange
c_s is [2.25000001 0.47      ] ; v1 is Doctor ; v2 is Strange
c_s is [2.25       0.47000001] ; v1 is Doctor ; v2 is Strange
c_s is [2.15529163 0.19056832] ; v1 is Doctor ; v2 is Strange
c_s is [2.15529164 0.19056832] ; v1 is Doctor ; v2 is Strange
c_s is [2.15529163 0.19056833] ; v1 is Doctor ; v2 is Strange
c_s is [2.24765108 0.46306966] ; v1 is Doctor ; v2 is Strange
c_s is [2.2476511  0.46306966] ; v1 is Doctor ; v2 is Strange
c_s is [2.24765108 0.46306967] ; v1 is Doctor ; v2 is Strange
c_s is [2.24898418 0.46700288] ; v1 is Doctor ; v2 is Strange
c_s is [2.24898419 0.46700288] ; v1 is Doctor ; v2 is Strange
c_s is [2.24898418 0.46700289] ; v1 is Doctor ; v2 is Strange
c_s is [2.24209018 0.46915035] ; v1 is Doctor ; v2 is Strange
c_s is [2.24209019 0.46915035] ; v1 is Doctor ; v2 is Strange
c_s is [2.24209018 0.46915036] ; v1 is Doctor ; v2 is Strange
c_s is [2.21451417 0.47774023] ; v1 is Doctor ; v2 is Strange
c_s is [2.21451418 0.47774023] ; v1 is Doctor ; v2 is Strange
c_s is [2.21451417 0.47774025] ; v1 is Doctor ; v2 is Strange
c_s is [2.13294898 0.50319907] ; v1 is Doctor ; v2 is Strange
c_s is [2.132949   0.50319907] ; v1 is Doctor ; v2 is Strange
c_s is [2.13294898 0.50319909] ; v1 is Doctor ; v2 is Strange
c_s is [2.11475042 0.50887914] ; v1 is Doctor ; v2 is Strange
c_s is [2.11475043 0.50887914] ; v1 is Doctor ; v2 is Strange
c_s is [2.11475042 0.50887916] ; v1 is Doctor ; v2 is Strange
  message: Optimization terminated successfully.
  success: True
   status: 0
      fun: 0.11930057876222537
        x: [ 2.115e+00  5.089e-01]
      nit: 4
      jac: [-2.442e-06  1.345e-06]
 hess_inv: [[ 1.773e+01 -5.527e+00]
            [-5.527e+00  1.735e+00]]
     nfev: 24
     njev: 8

To be more compact, we can assemble and pass the args argument in one go:

# Compiling and passing args in one line.
minimize(rmse_with_extra_args, [2.25, 0.47], args=('Doctor', 'Strange'))
c_s is [2.25 0.47] ; v1 is Doctor ; v2 is Strange
c_s is [2.25000001 0.47      ] ; v1 is Doctor ; v2 is Strange
c_s is [2.25       0.47000001] ; v1 is Doctor ; v2 is Strange
c_s is [2.15529163 0.19056832] ; v1 is Doctor ; v2 is Strange
c_s is [2.15529164 0.19056832] ; v1 is Doctor ; v2 is Strange
c_s is [2.15529163 0.19056833] ; v1 is Doctor ; v2 is Strange
c_s is [2.24765108 0.46306966] ; v1 is Doctor ; v2 is Strange
c_s is [2.2476511  0.46306966] ; v1 is Doctor ; v2 is Strange
c_s is [2.24765108 0.46306967] ; v1 is Doctor ; v2 is Strange
c_s is [2.24898418 0.46700288] ; v1 is Doctor ; v2 is Strange
c_s is [2.24898419 0.46700288] ; v1 is Doctor ; v2 is Strange
c_s is [2.24898418 0.46700289] ; v1 is Doctor ; v2 is Strange
c_s is [2.24209018 0.46915035] ; v1 is Doctor ; v2 is Strange
c_s is [2.24209019 0.46915035] ; v1 is Doctor ; v2 is Strange
c_s is [2.24209018 0.46915036] ; v1 is Doctor ; v2 is Strange
c_s is [2.21451417 0.47774023] ; v1 is Doctor ; v2 is Strange
c_s is [2.21451418 0.47774023] ; v1 is Doctor ; v2 is Strange
c_s is [2.21451417 0.47774025] ; v1 is Doctor ; v2 is Strange
c_s is [2.13294898 0.50319907] ; v1 is Doctor ; v2 is Strange
c_s is [2.132949   0.50319907] ; v1 is Doctor ; v2 is Strange
c_s is [2.13294898 0.50319909] ; v1 is Doctor ; v2 is Strange
c_s is [2.11475042 0.50887914] ; v1 is Doctor ; v2 is Strange
c_s is [2.11475043 0.50887914] ; v1 is Doctor ; v2 is Strange
c_s is [2.11475042 0.50887916] ; v1 is Doctor ; v2 is Strange
  message: Optimization terminated successfully.
  success: True
   status: 0
      fun: 0.11930057876222537
        x: [ 2.115e+00  5.089e-01]
      nit: 4
      jac: [-2.442e-06  1.345e-06]
 hess_inv: [[ 1.773e+01 -5.527e+00]
            [-5.527e+00  1.735e+00]]
     nfev: 24
     njev: 8

Notice that, each time minimize calls rmse_with_extra_args it passes the first value of args as the second argument to the objective function (v1) and the second value of args as the third argument to the objective function (v2).

args is the solution to our problem of near-identical functions for different x and y values - we can use args to pass the x and y values for our function to work on:

def rmse_any_line(c_s, x_values, y_values):
    c, s = c_s
    predicted = c + x_values * s
    errors = y_values - predicted
    return np.sqrt(np.mean(errors ** 2))

Now we can use the same objective function for any pair of x and y values:

minimize(rmse_any_line, [2.25, 0.47], args=(easiness, quality))
  message: Optimization terminated successfully.
  success: True
   status: 0
      fun: 0.11930057876222537
        x: [ 2.115e+00  5.089e-01]
      nit: 4
      jac: [-2.442e-06  1.345e-06]
 hess_inv: [[ 1.773e+01 -5.527e+00]
            [-5.527e+00  1.735e+00]]
     nfev: 24
     njev: 8
minimize(rmse_any_line, [2.25, 0.47], args=(easiness, clarity))
  message: Optimization terminated successfully.
  success: True
   status: 0
      fun: 0.11930057876222537
        x: [ 2.115e+00  5.089e-01]
      nit: 4
      jac: [-2.442e-06  1.345e-06]
 hess_inv: [[ 1.773e+01 -5.527e+00]
            [-5.527e+00  1.735e+00]]
     nfev: 24
     njev: 8