pathlib is a standard library in Python that provides an object-oriented file system path operation interface. With pathlib, you can process file paths in a more intuitive and easier to understand way, without having to use string operations to split and join paths.
os module vs pathlib
Prior to Python 3.4, a more traditional way to handle file paths was to use the os module.
We can demonstrate the unique value of pathlib by considering a common task in data science: how to find all png files in a given directory and all its subdirectories.
If we are using the Os module, we might write the following code:
import os dir_path = "/home/user/documents" files = [ (dir_path, f) for f in (dir_path) if ((dir_path, f)) and (".png") ]
While this code solves the instant task of finding png files, it reveals several major drawbacks of the os module. First, the code is long and almost unreadable, which is a shame, considering that it is a relatively simple operation. Second, our code assumes knowledge of list parsing, which should not be taken for granted. The third point is that the code involves string operations, which is prone to errors. The most important thing is that the code is not very concise.
Instead, if we use the pathlib module, our code will be much simpler. As we mentioned, pathlib provides an object-oriented method to handle file system paths.
from pathlib import Path # Create a path object dir_path = Path(dir_path) # Find all text files inside a directory files = list(dir_path.glob("*.png"))
This object-oriented programming organizes code around objects and their interactions, resulting in more modular, reusable and maintainable code.
Using Path Objects in Python
The pathlib library revolves around the so-called Path object, a structured and platform-independent way.
We use the following line of code to introduce the Path class from the pathlib module into the current namespace:
from pathlib import Path
After calling the Path class from pathlib, we can create Path objects in a variety of ways, including from strings, from other Path objects, from the current working directory, and from the home directory.
Create a path object from a string
We can create a Path object by passing a string representing the file system path to a variable. This converts the string representation of the file path to a Path object.
file_path_str = "data/union_data.csv" data_path = Path(file_path_str)
Created from other path objects
An existing Path object can be used as a building block for creating a new path.
We do this by combining the base path, data directory, and filename into a single file path. We must remember to use forward slashes where appropriate to extend the Path object.
base_path = Path("/home/user") data_dir = Path("data") # Combining multiple paths file_path = base_path / data_dir / "" print(file_path)
Output
'/home/user/data/'
Create a path object from the current working directory
Here we use the() method to assign the current working directory to the cwd variable. We can then retrieve the path to the current working directory where the script is running.
cwd = () print(cwd)
Output
'/home/bexgboost/articles/2024/4_april/8_pathlib'
Create a path object from the main working directory
We can construct the path by combining the home directory with other subdirectories. Here we merge the home directory with the subdirectories "downloads" and "projects". "
home = () home / "downloads" / "projects"
Important: The Path class itself does not perform any file system operations, such as path verification, directory creation, or file. It is designed to represent and operate paths. To actually interact with the file system (check if it exists, reads/writes files), we have to use a special method of the Path object, for some advanced cases, get help from the os module.
Using Path Components in Python
File path attributes are various properties and components of file paths that help identify and manage files and directories in the file system. Just as physical addresses have different parts, such as street numbers, city, country and postal codes, the file system path can be broken down into smaller components. pathlib allows us to access and manipulate these components using path properties through point notation.
Using root directory
The root directory is the top-level directory in the file system. In Unix-like systems, it is represented by a forward slash (/). In Windows, it is usually a drive letter followed by a colon, such as C:.
image_file = home / "downloads" / "" image_file.root
Output
'/'
Using parent directory
The parent contains the current file or directory. It is one level higher than the current directory or file.
image_file.parent
Output
PosixPath('/home/bexgboost/downloads')
Use filename
This property returns the entire file name as a string, including the extension.
image_file.name
Output
''
Use file suffix
The suffix attribute returns the file extension as a string, including dots (if there is no extension, it returns an empty string).
image_file.suffix
Output
'.png'
Note: On Mac, file paths are case sensitive, so /Users/username/Documents and /users/username/documents will be different.
.parts attribute
We can use the .parts property to split the Path object into its components.
image_file.parts
Output
('/', 'home', 'bexgboost', 'downloads', '')
Common path operations using pathlib
Here are some examples of common path operations using pathlib:
1. Create a path object:
from pathlib import Path p = Path("/usr/bin")
2. Path stitching:
new_path = p / "local" / ""
3. Get the file extension:
extension = new_path.suffix
4. Change the file extension:
new_extension = new_path.with_suffix(".txt")
5. Check whether the path exists:
exists = new_path.exists()
6. Check whether the path is a file:
is_file = new_path.is_file()
7. Check whether the path is a directory:
is_dir = new_path.is_dir()
8. Read the file content:
with new_path.open('r') as file: content = ()
9. Write file content:
with new_path.open('w') as file: ("Hello, world!")
10. Delete the file:
new_path.unlink()
11. Create a directory:
new_path.mkdir(parents=True, exist_ok=True)
12. Delete the directory:
new_path.rmdir()
14. Get the files and subdirectories in the directory:
items = list(new_path.iterdir())
15. Absolute path:
absolute_path = new_path.absolute()
16. Relative path:
relative_path = new_path.relative_to("/usr")
17. Path analysis:
parts = new_path.parts drive, root, parts = new_path.drive, new_path.root, new_path.parts
18. Path string conversion:
path_str = str(new_path)
19.Traveling through the directory tree:
for child in new_path.rglob('*.py'): print(child)
These are some of the basic operations provided by the pathlib module. The pathlib module makes the operation of file system paths more intuitive and easy to manage.
This is the article about Python's file system operation using pathlib. For more related Python pathlib file system operation content, please search for my previous articles or continue browsing the related articles below. I hope everyone will support me in the future!