Using the pathlib
module#
The pathlib
module is one several ways of manipulating and using file paths in
Python — and the one we recommend to a beginner.
The primary documentation for pathlib
is
https://docs.python.org/3/library/pathlib.html.
The standard way to use the Pathlib module is to import the Path
class from the module:
from pathlib import Path
In Jupyter or IPython, you can tab complete on Path
to list the methods
(functions) and attributes attached to it.
An object (value) of type Path
represents a pathname. A pathname is a string
that identifies a particular file or directory on a computer filesystem.
Let us start by making a default object from the Path
class, like this:
p = Path()
p
PosixPath('.')
By default, the path object, here p
, refers to our current working directory,
or .
for short. .
is a relative path, meaning that we specify where we
are relative to our current directory. .
means we are exactly in our current
directory.
Because the .
is a relative path, it does not tell us where we are in
the filesystem, only where we are relative to the current directory.
Path objects have an absolute
function attached to them. Another way of
saying this is that Path objects have an absolute
method. Calling
this method gives us the absolute location of the path, meaning, the
filesystem position relative to the base location of the disk the file is
on.
abs_p = p.absolute()
abs_p
PosixPath('/home/runner/work/textbook/textbook/extra')
Notice the /
in front of the absolute filename (on Unix), meaning the
base location for all files. You will see a drive location like C:
or
similar, at the front of the absolute path, if you are on Windows.
We can always convert the Path
object to a simple string, using the
str
function. str()
converts anything to a string, if it can:
# The path, as a string
str(abs_p)
'/home/runner/work/textbook/textbook/extra'
Sometimes we want to get a path referring the directory containing a path.
The Path object has a parent
attribute attached to it (an attribute is data
attached to an object). The parent
attribute is a Path object for the
containing directory:
abs_p.parent
PosixPath('/home/runner/work/textbook/textbook')
The parent
attribute of the Path object gives the directory name from a
full file path. It works correctly for Unix paths on Unix machines, and
Windows paths on Windows machines.
# On Unix
a_path = Path('/a/full/path/then_filename.txt')
# Show the directory containing the file.
a_path.parent
PosixPath('/a/full/path')
parent
also works for relative paths.
# On Unix
rel_path = Path('relative/path/then_filename.txt')
rel_path.parent
PosixPath('relative/path')
Use the name
attribute of the Path object to get the filename rather
than the directory name:
# On Unix
rel_path.name
'then_filename.txt'
Sometimes you want to join one or more directory names with a filename to get a
path. Path objects have a clever way of doing this, by overriding the /
(division) operator.
To remind you about operator overloading, remember that addition means different things for numbers and strings. For numbers, addition means arithmetic addition:
# Addition for numbers
2 + 2
4
For strings, addition means concatenation — sticking the strings together:
# Addition for strings.
"first" + "second"
'firstsecond'
Path objects use the division operator /
to mean “stick the path fragments
together to make a new path, where the /
separates directories”.
# On Unix
Path('relative') / 'path' / 'then_filename.txt'
PosixPath('relative/path/then_filename.txt')
This also works on Windows and Unix in the same way.
Sometimes you want to get the filename extension. Use the suffix
attribute for this:
rel_path
PosixPath('relative/path/then_filename.txt')
rel_path.suffix
'.txt'
You will often find yourself wanting to replace the file extension. You can do
this with the with_suffix
method:
rel_path.with_suffix('.md')
PosixPath('relative/path/then_filename.md')
Path objects also have methods that allow you to read and write text characters and raw bytes.
Let us make a new path to point to a file we will write in the current directory.
new_path = Path() / 'a_test_file.txt'
new_path
PosixPath('a_test_file.txt')
We can write text characters (strings) to this file, with the write_text
method:
a_multiline_string = """Some text.
More text.
Last text."""
new_path.write_text(a_multiline_string)
32
We can read the text out of a file using read_text
:
new_path.read_text()
'Some text.\nMore text.\nLast text.'
Similarly, we can write and read raw byte data, using write_bytes
and read_bytes
.
It is often useful to read in a text file, and split the result into lines. We
do this with read_text
, and then we use the splitlines
method of string
object to split the read text into lines.
text = new_path.read_text()
text.splitlines()
['Some text.', 'More text.', 'Last text.']
Listing files in a directory#
We can use the glob
method of the Path
object to give a list of all, or
some files in a directory.
For example, to see all files in the current directory, we could do this:
cwd = Path()
list(cwd.glob('*'))
[PosixPath('pathlib.Rmd'),
PosixPath('length_one_tuples.Rmd'),
PosixPath('string_formatting.Rmd'),
PosixPath('mean_deviations.md'),
PosixPath('introducing_python.Rmd'),
PosixPath('assert.Rmd'),
PosixPath('truthiness.Rmd'),
PosixPath('slope_deviations.md'),
PosixPath('more_on_lists.Rmd'),
PosixPath('extra.md'),
PosixPath('monty_hall_lists.Rmd'),
PosixPath('data8_functions.Rmd'),
PosixPath('a_test_file.txt'),
PosixPath('mean_sq_deviations.md'),
PosixPath('brisk_python.Rmd')]
Notice two things here.
Selecting files with glob
#
The argument to the glob
method above is '*'
. The '*'
tells glob
to
get all files and directories, using what is called a Glob
match. This is a powerful
feature that allows you to be selective in asking for the files that glob
returns. For example, if you wanted to see only the files ending with .txt
you could do:
list(cwd.glob('*.txt'))
[PosixPath('a_test_file.txt')]
There are more detail in the page linked above.
list
around the output of glob
#
Notice that we used list
around the output of glob
, as in, for example:
list(cwd.glob('*.txt'))
[PosixPath('a_test_file.txt')]
This is because glob
returns something called a generator which can
return all the Path objects, but will not do that until we ask it to.
cwd.glob('*.txt')
<generator object Path.glob at 0x7f9664443890>
The list
call converts the result into a list, and in doing so, asks the
generator to return all the Path objects:
list(cwd.glob('*.txt'))
[PosixPath('a_test_file.txt')]
Deleting files#
And finally, to be tidy, we use the unlink
method to delete the temporary
file we were using. unlink
is strangely
named, where the name refers to the way
the computer disk system stores files, but does always have the effect of
deleting the file.
new_path.unlink()