Random choice#
# Import the array library
import numpy as np
# Make random number generator.
rng = np.random.default_rng()
We are building up to an answer to the three girls problem. At the end of the lists page, we found a way to simulate one family, and get a count.
Up until now, we have simulated each child in the family by randomly choosing an integer from 0 up to, but not including 2. Therefore, we are randomly choosing between 0 and 1, like this.
one_child = rng.integers(0, 2)
one_child
1
Now we know about function arguments, lists and arrays, we can use a nicer
alternative to do the same thing — rng.choice
.
The choice
method of the Numpy random generator object asks Numpy to choose
from the choices you sent it, at random.
For example, say we want the rng
to choose randomly between 0 and 1, we could
do this:
# Choose randomly between 0 and 1.
alternatives = [0, 1]
rng.choice(alternatives)
0
p
for choices with probabilities#
By default, the choice
method chooses at random from the choice you give,
return each choice with the same probability. In our case, it is returning 0
with probability 0.5 and 1 with probability 0.5.
However, you can use the p
keyword argument to tell the function the probabilities to use for each choice.
For example, let us return to the supreme court jury problem. Remember, we wanted to simulate a jury of 12 people, where each juror has a 26% chance of being Black.
We use 1 to mean Black, and 0 to mean White.
# 1 means Black, 0 means White.
colors = [1, 0]
Now we want to select 1 (Black) with probability 0.26 (26%) and 0 (White) with
probability 1 - 0.26 = 0.74. The p
keyword argument to choice
allows us to
do that:
# Select Black with probability 0.26, White with probability 0.74.
rng.choice(colors, p=[0.26, 0.74])
1
Run the cell above a few times to persuade yourself that you see 1 about quarter of the time, and 0 otherwise.
Multiple choices with size
#
The choice
method has another useful keyword argument, size
. We can use
this to ask choice
to make multiple choices at the same time. choice
returns the choices as an array. For example, say we wanted to choose 12
jurors, each with a probability of 0.26 of being 1 (Black). We could do this:
# Select 12 jurors in one go.
rng.choice(colors, p=[0.26, 0.74], size=12)
array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0])
Now consider the problem of choosing 4 children, each with a probability of 0.5 of being a Girl. Let’s do the same as we did before, and use 0 for a boy, and 1 for girl.
sexes = [0, 1]
children = rng.choice(sexes, p=[0.5, 0.5], size=4)
children
array([0, 1, 1, 0])
Notice that the default is for choice
to give equal probability the given
choices, so, in this case, we can simply miss out the p=[0.5, 0.5]
argument
and get the same outcome:
# choice selects each option with the same probability unless told otherwise.
children = rng.choice(sexes, size=4)
children
array([0, 0, 0, 1])
replace
to choose with and without replacement#
When you take choices from options, you do this in one of two ways:
With replacement.
Without replacement.
So far you have seen the with replacement option. This is the default for rng.choice
.
For example, let’s say we want to build a lottery to play with your fellow students. We want the kind of lottery where you have 10 balls, numbered 0 through 9. You put the balls into a bucket, close your eyes, shake the bucket, then draw one ball, and call out the number.
Then you put that ball back and repeat the procedure two more times.
Each student writes down three numbers from 0 through 9, such as ‘382’ or ‘919’.
The winner is the student who got the exact three numbers that you drew from the bucket.
We can simulate the lottery process like this:
balls = np.arange(10)
balls
array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
drawn = rng.choice(balls, size=3)
drawn
array([0, 9, 6])
For reflection - if there are 100 students playing, what is the chance that at least one of the 25 will get the exact number?.
That is one way to do the lottery process, where we put the ball back before drawing the next ball. In that case, for each digit, you have the same chance of getting each of the numbers 0 through 9. The is sampling with replacement, when we draw the ball and put it back, before doing the next draw.
Another option is sampling without replacement. This is where we don’t put the drawn ball back, but put it aside.
If we do that, that changes the probabilities of the following numbers. Specifically, let us say we draw a 9 first, and put it aside, that means we cannot get a 9 for the second and third draws. For that case, it doesn’t make sense to guess ‘929’ as the three numbers, because this cannot happen. And in general, the digits cannot repeat in this case.
We can simulate that situation using the replace=False
option to the
rng.choice
function, like this:
drawn_without_replacement = rng.choice(balls, size=3, replace=False)
drawn_without_replacement
array([3, 0, 5])
If you run this cell a few times, you will see that you never get repeated digits.
Notice that replace=True
is the default, so these two cells both do sampling
with replacement, selecting 10 balls at random from the 10 available.
# With replacement is the default.
rng.choice(balls, size=10)
array([6, 7, 0, 3, 5, 7, 0, 5, 5, 7])
# You can also specify replace=True explicitly, for clarity.
rng.choice(balls, size=10, replace=True)
array([4, 9, 1, 2, 3, 0, 6, 2, 6, 5])
For reflection - if there are 100 students playing, what is the chance that at least one of the 25 will get the exact number, if we are sampling without replacement?.
One obvious place where we use sampling without replacement, is dealing a deck of cards. Once we have dealt a particular card, such as the ace of spades, that card cannot be dealt again — it is no longer in the pack from which we are dealing.
Sampling without replacement means that we can exhaust our choices. For example, if we try to draw 11 balls from the 10 available, we will get an error:
# Exhausting our options.
rng.choice(balls, size=11, replace=False)
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
Cell In[14], line 2
1 # Exhausting our options.
----> 2 rng.choice(balls, size=11, replace=False)
File numpy/random/_generator.pyx:853, in numpy.random._generator.Generator.choice()
ValueError: Cannot take a larger sample than population when replace is False
If we are drawing with replacement we can draw as many balls as we like:
# We can't exhaust our options.
rng.choice(balls, size=100, replace=True)
array([3, 0, 1, 1, 3, 4, 1, 3, 9, 7, 5, 7, 5, 7, 6, 3, 6, 8, 0, 8, 7, 3,
5, 0, 5, 8, 9, 8, 8, 7, 7, 0, 5, 3, 1, 0, 3, 3, 6, 0, 0, 7, 2, 7,
7, 0, 7, 6, 8, 1, 9, 3, 0, 8, 0, 8, 1, 1, 6, 2, 6, 3, 0, 7, 3, 9,
0, 8, 9, 5, 3, 4, 7, 4, 3, 1, 6, 6, 1, 2, 2, 2, 0, 7, 4, 2, 6, 5,
3, 4, 1, 7, 3, 2, 4, 3, 4, 0, 5, 1])