A simpler problem#
Imagine a family with four children.
What is the probability that the family will have exactly three girls?
There are various ways to answer this question. One way, is to use simulation.
Simulation makes a model of the problem. We use the model to generate simulated data. If the model is a good one, the simulated data should look like the real data.
First we need to simulate a family of four children.
Then we need to count the number of girls.
We do this many many times, and see how often we get a count of 3.
In our model, the chances of any one child being a boy or a girl is 0.5, or 50%. To do our simulation, we need something random, like the toss of a coin, to decide if the next child is a boy or a girl.
Tossing a coin is a bit time-consuming, and your computer is well designed to solve this task in a more efficient way. It can generate random numbers.
To make random numbers, and do other random things, we need a Numpy Random Number Generator. We make one like this:
# Get the Numpy library
import numpy as np
# Make a random number generator
rng = np.random.default_rng()
# Show the result
rng
Generator(PCG64) at 0x7FFBDC1FACE0
This allows us to do things like generate random numbers. Here is an example; we generate a random whole number anywhere between 0 and 1:
rng.uniform(0, 1)
0.3027184874034289
We can also ask our rng
to generate random whole numbers using the
integers
function (well, actually, it’s a method, but that’s a distinction
we won’t worry about for now).
For example, to generate a random whole number from 0 through 9, we could do this:
# Get a random number from 0 through 9, store in "a"
a = rng.integers(0, 10)
# Show the result.
a
0
Run the cell above a few times by clicking inside the cell, and pressing Cmd-Enter a few times. You should see random numbers from 0 through 9.
Notice that we write rng.integers(0, 10)
and not rng.integers(0, 9)
. The
second number, 10, is one above the largest integer we will allow. Read this
as a random integer from 0 up to but not including 10.
Now we have the integers
function, we can run the following cell 4 times, to
get 4 random numbers between 0 and 1.
Call 0 a boy, and 1 a girl. If we run this four times, then we have one simulated family. We can count how many 1s (girls) we got in the four runs, and that is the simulated number of girls, for this family.
# Return a random number that is either 0 or 1.
# Read this as:
# Return a whole number from 0 up to, but not including 2.
rng.integers(0, 2)
1
rng.integers
is a function (well, a method, but whatever).
rng.integers(0, 2)
calls the function, and returns a random number, that is
either 0 and 1.
It’s inconvenient to have to run this cell many times. We really need some machinery to make the computer do that for us. We need variables, functions, comparisons and arrays. We will deal with those next.