Pandas plotting methods

Pandas plotting methods#

We start by loading our familiar gender_data dataset.

# Load the Numpy array library, call it 'np'
import numpy as np
# Load the Pandas data science library, call it 'pd'
import pandas as pd
# Turn on a setting to use Pandas more safely.
pd.set_option('mode.copy_on_write', True)

If you are running on your laptop, you should download the gender_stats_min.csv file to the same directory as this notebook.

# Load the data file
gender_data = pd.read_csv('gender_stats_min.csv')
gender_data.head()

	country_name	country_code	gdp_us_billion	mat_mort_ratio	population
0	Aruba	ABW	NaN	NaN	0.103744
1	Afghanistan	AFG	19.961015	444.00	32.715838
2	Angola	AGO	111.936542	501.25	26.937545
3	Albania	ALB	12.327586	29.25	2.888280
4	Andorra	AND	3.197538	NaN	0.079547

# Get the GDP values as a Pandas Series
gdp = gender_data['gdp_us_billion']
gdp.head()

         NaN
   19.961015
  111.936542
   12.327586
    3.197538
Name: gdp_us_billion, dtype: float64

Plotting with methods#

You have already seen basic plotting with the Matplotlib library.

Here is the magic incantation to load the Matplotlib plotting library.

# Load the library for plotting, name it 'plt'
import matplotlib.pyplot as plt
# Make plots look a little more fancy
plt.style.use('fivethirtyeight')

Here is basic plotting of a Pandas series, using Matplotlib. This is what you have already seen.

plt.hist(gdp);

../_images/a031437b58b9ac9576b2a4be1ff1bb3064f301b9743a355765a746e3f1b4196a.png

It is possible you will see warnings as Matplotlib tried to calculate the bin widths for the histogram. If you do see them, these warnings result from Matplotlib struggling with NaN (missing values.

Another way to do the histogram, is to use the hist method of the series.

A method is a function attached to a value. In this case hist is a function attached to a value of type Series.

Using the hist method instead of the plt.hist function can make the code a bit easier to read. The method also has the advantage that it discards the NaN values, by default, so it does not generate the same warnings.

gdp.hist();

Now we have had a look at the GDP values, we will look at the values for the mat_mort_ratio column. These are the numbers of women who die in childbirth for every 100,000 births.

mmr = gender_data['mat_mort_ratio']
mmr

       NaN
    444.00
    501.25
     29.25
       NaN
        ...  
     NaN
  399.75
  143.75
  233.75
  398.00
Name: mat_mort_ratio, Length: 216, dtype: float64

mmr.hist();

../_images/460c1256ce1b6d33a93bce02deceac32ccd57bc4ec65825267b061bdf21bb4c8.png

We are interested in the relationship of gpp and mmr. Maybe richer countries have better health care, and fewer maternal deaths.

Here is a plot, using the standard Matplotlib scatter function.

plt.scatter(gdp, mmr);

../_images/6c1d2b9fc2a0860f70aac3f7dfcc7a835fd2307e0b427f263099120b49efad7f.png

We can do the same plot using the plot.scatter method on the data frame. In that case, we specify the column names that should go on the x and the y axes.

gender_data.plot.scatter('gdp_us_billion', 'mat_mort_ratio');

../_images/4d3bb72bf9f3ed6a48094fd3cb75c5a7a05d3277f2b6b4c69c319c2bbd701028.png

An advantage of doing it this way is that we get the column names on the x and y axes by default.

Pandas plotting methods

Contents

Pandas plotting methods#

Plotting with methods#