Arrays#

There are several kinds of sequences in Python. A list is one. However, the sequence type that we will use most in the class, is the array.

The numpy package, abbreviated np in programs, provides Python programmers with convenient and powerful functions for creating and manipulating arrays.

# Load the Numpy package, and call it "np".
import numpy as np

Creating arrays#

The array function from the Numpy package creates an array from single values, or sequences of values.

For example, remember my_list?

my_list = [1, 2, 3]

This is a list:

type(my_list)
list

The array function from Numpy can make an array from this list:

my_array = np.array(my_list)
my_array
array([1, 2, 3])

As you can see from the display above, this is an array. We confirm it with type:

type(my_array)
numpy.ndarray

We can also create the list and then the array in one call, like this:

my_array = np.array([1, 2, 3])
my_array
array([1, 2, 3])

Here [1, 2, 3] is an expression that returns a list. np.array then operates on the returned list, to create an array.

Arrays often contain numbers, but, like lists, they can also contain strings or other types of values. However, a single array can only contain a single kind of data. (It usually doesn’t make sense to group together unlike data anyway.)

For example,

english_parts_of_speech = np.array(["noun", "pronoun", "verb", "adverb", "adjective", "conjunction", "preposition", "interjection"])
english_parts_of_speech
array(['noun', 'pronoun', 'verb', 'adverb', 'adjective', 'conjunction',
       'preposition', 'interjection'], dtype='<U12')

We have not seen this yet, but Python allows us to spread expressions between round and square brackets across many lines. It knows that the expression has not finished yet because it is waiting for the closing bracket. For example, this cell works in the exactly the same way as the cell above, and may be easier to read:

# An expression between brackets spread across many lines.
english_parts_of_speech = np.array(
    ["noun",
    "pronoun",
    "verb",
    "adverb",
    "adjective",
    "conjunction",
    "preposition",
    "interjection"]
    )
english_parts_of_speech
array(['noun', 'pronoun', 'verb', 'adverb', 'adjective', 'conjunction',
       'preposition', 'interjection'], dtype='<U12')

Below, we collect four different temperatures into a list called temps. These are the estimated average daily high temperatures over all land on Earth (in degrees Celsius) for the decades surrounding 1850, 1900, 1950, and 2000, respectively, expressed as deviations from the average absolute high temperature between 1951 and 1980, which was 14.48 degrees.

If you are interested, you can get more data from the Berkeley Earth project.

highs = np.array([13.6  , 14.387, 14.585, 15.164])
highs
array([13.6  , 14.387, 14.585, 15.164])

Calculations with arrays#

Arrays can be used in arithmetic expressions to compute over their contents. When an array is combined with a single number, that number is combined with each element of the array. Therefore, we can convert all of these temperatures to Fahrenheit by writing the familiar conversion formula.

(9/5) * highs + 32
array([56.48  , 57.8966, 58.253 , 59.2952])

As we saw for strings, arrays have methods, which are functions that operate on the array values. The mean of a collection of numbers is its average value: the sum divided by the length. Each pair of parentheses in the examples below is part of a call expression; it’s calling a function with no arguments to perform a computation on the array called highs.

# The number of elements in the array
highs.size
4
highs.sum()
57.736000000000004
highs.mean()
14.434000000000001

Functions on Arrays#

Numpy provides various useful functions for operating on arrays.

For example, the diff function computes the difference between each adjacent pair of elements in an array. The first element of the diff is the second element minus the first.

np.diff(highs)
array([0.787, 0.198, 0.579])

The full Numpy reference lists these functions exhaustively, but only a small subset are used commonly for data processing applications. These are grouped into different packages within np. Learning this vocabulary is an important part of learning the Python language, so refer back to this list often as you work through examples and problems.

However, you don’t need to memorize these. Use this as a reference.

Each of these functions takes an array as an argument and returns a single value.

Function

Description

np.sum

Add all elements together

np.prod

Multiply all elements together

np.all

Test whether all elements are true values (non-zero numbers are true)

np.any

Test whether any elements are true values (non-zero numbers are true)

np.count_nonzero

Count the number of non-zero elements

np.mean

Calculate the mean (sum divided by number of elements)

np.min

The minimum over the array

np.max

The maximum over the array

Each of these functions takes an array as an argument and returns an array of values.

Function

Description

np.diff

Difference between adjacent elements

np.round

Round each number to the nearest integer (whole number)

np.abs

Absolute value of each number (remove minus signs

np.cumprod

A cumulative product: for each element, multiply all elements so far

np.cumsum

A cumulative sum: for each element, add all elements so far

np.exp

Exponentiate each element

np.log

Take the natural logarithm of each element

np.sqrt

Take the square root of each element

np.sort

Sort the elements

Each of these functions takes an array of strings and returns an array.

Function

Description

np.char.lower

Lowercase each element

np.char.upper

Uppercase each element

np.char.strip

Remove spaces at the beginning or end of each element

np.char.isalpha

Whether each element is only letters (no numbers or symbols)

np.char.isnumeric

Whether each element is only numeric (no letters)

Each of these functions takes both an array of strings and a search string; each returns an array.

Function

Description

np.char.count

Count the number of times a search string appears among the elements of an array

np.char.find

The position within each element that a search string is found first

np.char.rfind

The position within each element that a search string is found last

np.char.startswith

Whether each element starts with the search string

Note

This page has content from the Arrays notebook of an older version of the UC Berkeley data science course. See the Berkeley course section of the license file.