Leaping ahead

Leaping ahead#

We are still building up to a solution for the three girls problem.

We have more of the building blocks we need.

# Load the Numpy package, and rename to "np"
import numpy as np
# Make random number generator.
rng = np.random.default_rng()

We are going to simulate 10000 families, each with four children.

Call one family one trial. Each trial involves the simulation of four children.

Here we put together what we have up until now. We are creating a Numpy array that has 10000 values, each of which is 0

# Make an array of zeros to store the counts for each of the 10000 families.
counts = np.zeros(10000)
counts
array([0., 0., 0., ..., 0., 0., 0.])

counts has 10000 elements. When we have finished, each of these 10000 values will be the number of girls in one simulated family.

From boolean arrays, here is how we make an array that simulates a family of four children, and count the number of girls.

# Make an array of four random choices from 0 or 1.
# 1 means a girl.
children = rng.choice([0, 1], size=4)
# Add up the integers to count the number of girls.
count = np.sum(children)
count
3

We can store this count at the beginning of our 10000 counts:

# Store the count as the first value in the counts array.
counts[0] = count
# Show the first 10 values
counts[:10]
array([3., 0., 0., 0., 0., 0., 0., 0., 0., 0.])

Here that is again, in a single cell:

# Our first simulated family
children = rng.choice([0, 1], size=4)
counts[0] = np.sum(children)
counts[:10]
array([2., 0., 0., 0., 0., 0., 0., 0., 0., 0.])

To add another family, we just repeat the same, but storing the result as the second value in the counts array:

# Our second simulated family
children = rng.choice([0, 1], size=4)
counts[1] = np.sum(children)
counts[:10]
array([2., 2., 0., 0., 0., 0., 0., 0., 0., 0.])

Notice that the only thing that changed is that we stored the value at position (offset) 1 instead of position (offset) 0.

Obviously this is getting a bit boring and repetitive. Surely we can do better.

Yes indeed, we can use a for loop. We will see much more of for loops very soon. You will recognize nearly all the code below, from the single trials we did above. The new thing is the for loop, that tells Python to repeat the indented steps 10000 times.

# Reset the counts array to 10000 zeros.
counts = np.zeros(10000)

# Repeat the indented stuff 10000 times.
for i in np.arange(10000):
    # The procedure for one family.
    children = rng.choice([0, 1], size=4)
    count = np.sum(children)
    # Store the count at position i in the counts array.
    counts[i] = count
# Show the first 10 counts
counts[:10]
array([3., 3., 2., 2., 2., 4., 2., 4., 2., 2.])

Now we have a count of the number of girls, from 10000 simulated families:

len(counts)
10000

We use Boolean arrays to make an array of 10000 elements, where each element is True if the corresponding element in counts was equal to 3, and False otherwise.

# The Boolean array
has_3_girls = counts == 3

np.count_nonzero counts the number of True values (and therefore, the number of counts equal to 3).

# Number of counts values equal to 3.
n_3_girls = np.count_nonzero(has_3_girls)
n_3_girls
2485

Finally we estimate the probability by dividing the number of times we saw 3 by the number of trials:

# Estimate probability of 3 girls.
n_3_girls / 10000
0.2485

The whole thing#

Here then is the whole solution to three girl problem.

It is made from the combination of the code in the cells above.

# Reset the counts array to 10000 zeros.
counts = np.zeros(10000)

# Repeat the indented stuff 10000 times.
for i in np.arange(10000):
    # The procedure for one family.
    children = rng.choice([0, 1], size=4)
    count = np.sum(children)
    # Store the count at position i in the counts array.
    counts[i] = count

# True where counts has the value 3, False otherwise.
has_3_girls = counts == 3

# Number of counts values equal to 3.
n_3_girls = np.count_nonzero(has_3_girls)

# Estimate and show probability of 3 girls.
n_3_girls / 10000
0.2533

For loops.#

The new part of this story is the for loop. On to iteration with for loops.