Using the pathlib module#

The pathlib module is one several ways of manipulating and using file paths in Python — and the one we recommend to a beginner.

The primary documentation for pathlib is https://docs.python.org/3/library/pathlib.html.

The standard way to use the Pathlib module is to import the Path class from the module:

from pathlib import Path

In Jupyter or IPython, you can tab complete on Path to list the methods (functions) and attributes attached to it.

An object (value) of type Path represents a pathname. A pathname is a string that identifies a particular file or directory on a computer filesystem.

Let us start by making a default object from the Path class, like this:

p = Path()
p
PosixPath('.')

By default, the path object, here p, refers to our current working directory, or . for short. . is a relative path, meaning that we specify where we are relative to our current directory. . means we are exactly in our current directory.

Because the . is a relative path, it does not tell us where we are in the filesystem, only where we are relative to the current directory.

Path objects have an absolute function attached to them. Another way of saying this is that Path objects have an absolute method. Calling this method gives us the absolute location of the path, meaning, the filesystem position relative to the base location of the disk the file is on.

abs_p = p.absolute()
abs_p
PosixPath('/home/runner/work/textbook/textbook/extra')

Notice the / in front of the absolute filename (on Unix), meaning the base location for all files. You will see a drive location like C: or similar, at the front of the absolute path, if you are on Windows.

We can always convert the Path object to a simple string, using the str function. str() converts anything to a string, if it can:

# The path, as a string
str(abs_p)
'/home/runner/work/textbook/textbook/extra'

Sometimes we want to get a path referring the directory containing a path. The Path object has a parent attribute attached to it (an attribute is data attached to an object). The parent attribute is a Path object for the containing directory:

abs_p.parent
PosixPath('/home/runner/work/textbook/textbook')

The parent attribute of the Path object gives the directory name from a full file path. It works correctly for Unix paths on Unix machines, and Windows paths on Windows machines.

# On Unix
a_path = Path('/a/full/path/then_filename.txt')
# Show the directory containing the file.
a_path.parent
PosixPath('/a/full/path')

parent also works for relative paths.

# On Unix
rel_path = Path('relative/path/then_filename.txt')
rel_path.parent
PosixPath('relative/path')

Use the name attribute of the Path object to get the filename rather than the directory name:

# On Unix
rel_path.name
'then_filename.txt'

Sometimes you want to join one or more directory names with a filename to get a path. Path objects have a clever way of doing this, by overriding the / (division) operator.

To remind you about operator overloading, remember that addition means different things for numbers and strings. For numbers, addition means arithmetic addition:

# Addition for numbers
2 + 2
4

For strings, addition means concatenation — sticking the strings together:

# Addition for strings.
"first" + "second"
'firstsecond'

Path objects use the division operator / to mean “stick the path fragments together to make a new path, where the / separates directories”.

# On Unix
Path('relative') / 'path' / 'then_filename.txt'
PosixPath('relative/path/then_filename.txt')

This also works on Windows and Unix in the same way.

Sometimes you want to get the filename extension. Use the suffix attribute for this:

rel_path
PosixPath('relative/path/then_filename.txt')
rel_path.suffix
'.txt'

You will often find yourself wanting to replace the file extension. You can do this with the with_suffix method:

rel_path.with_suffix('.md')
PosixPath('relative/path/then_filename.md')

Path objects also have methods that allow you to read and write text characters and raw bytes.

Let us make a new path to point to a file we will write in the current directory.

new_path = Path() / 'a_test_file.txt'
new_path
PosixPath('a_test_file.txt')

We can write text characters (strings) to this file, with the write_text method:

a_multiline_string = """Some text.
More text.
Last text."""
new_path.write_text(a_multiline_string)
32

We can read the text out of a file using read_text:

new_path.read_text()
'Some text.\nMore text.\nLast text.'

Similarly, we can write and read raw byte data, using write_bytes and read_bytes.

It is often useful to read in a text file, and split the result into lines. We do this with read_text, and then we use the splitlines method of string object to split the read text into lines.

text = new_path.read_text()
text.splitlines()
['Some text.', 'More text.', 'Last text.']

Listing files in a directory#

We can use the glob method of the Path object to give a list of all, or some files in a directory.

For example, to see all files in the current directory, we could do this:

cwd = Path()
list(cwd.glob('*'))
[PosixPath('pathlib.Rmd'),
 PosixPath('length_one_tuples.Rmd'),
 PosixPath('string_formatting.Rmd'),
 PosixPath('mean_deviations.md'),
 PosixPath('introducing_python.Rmd'),
 PosixPath('assert.Rmd'),
 PosixPath('truthiness.Rmd'),
 PosixPath('slope_deviations.md'),
 PosixPath('more_on_lists.Rmd'),
 PosixPath('extra.md'),
 PosixPath('monty_hall_lists.Rmd'),
 PosixPath('data8_functions.Rmd'),
 PosixPath('a_test_file.txt'),
 PosixPath('mean_sq_deviations.md'),
 PosixPath('brisk_python.Rmd')]

Notice two things here.

Selecting files with glob#

The argument to the glob method above is '*'. The '*' tells glob to get all files and directories, using what is called a Glob match. This is a powerful feature that allows you to be selective in asking for the files that glob returns. For example, if you wanted to see only the files ending with .txt you could do:

list(cwd.glob('*.txt'))
[PosixPath('a_test_file.txt')]

There are more detail in the page linked above.

list around the output of glob#

Notice that we used list around the output of glob, as in, for example:

list(cwd.glob('*.txt'))
[PosixPath('a_test_file.txt')]

This is because glob returns something called a generator which can return all the Path objects, but will not do that until we ask it to.

cwd.glob('*.txt')
<generator object Path.glob at 0x7f9664443890>

The list call converts the result into a list, and in doing so, asks the generator to return all the Path objects:

list(cwd.glob('*.txt'))
[PosixPath('a_test_file.txt')]

Deleting files#

And finally, to be tidy, we use the unlink method to delete the temporary file we were using. unlink is strangely named, where the name refers to the way the computer disk system stores files, but does always have the effect of deleting the file.

new_path.unlink()