xlrd
Overview
xlrd is a Python library for reading Excel files. It helps users quickly extract data and analyze it. It is suitable for reading data from older Excel files (.xls format). For newer Excel files, other libraries such as openpyxl or pandas are recommended.
document:/en/latest/
Install
First, you need to make sure that xlrd is installed
pip install xlrd==1.2.0
Note: The new version of xlrd only supports xls format, so here specifies that the 1.2.0 version can be installed, and the xlsx format can be supported.
Read Excel file
Open an Excel file using xlrd
import xlrd # Open Excel fileworkbook = xlrd.open_workbook('')
sheet operation
Get sheet worksheet
Worksheets can be obtained by index or name
# 1. Find through indexsheet = workbook.sheet_by_index(0) # 2. Search by sheet name:sheet = workbook.sheet_by_name('Sheet1')
Get the number of rows and columns of the sheet
Get the number of rows and columns of the worksheet
# Get the number of rows and columnsnum_rows = num_cols = print(f"Number of rows: {num_rows}, Number of columns: {num_cols}")
Traversal sheet
# Get sheet countprint() # traverse all worksheets in Excel workbooksfor i in range(0, ): # Get the current worksheet based on the index sheet = workbook.sheet_by_index(i) # Print the value of the cell in the upper left corner of the current worksheet. The value of 1 row and 1 column. print(sheet.cell_value(0, 0))
# Get all sheet namesprint(workbook.sheet_names()) # traverse all worksheets in the workbookfor i in workbook.sheet_names(): # Get worksheet object based on worksheet name sheet = workbook.sheet_by_name(i) # Print the cell value of the first row and first column of the worksheet print(sheet.cell_value(0, 0))
Cell operations
Read cell content
The value of a cell can be read by specifying the index of rows and columns.
# Read a specific cell (for example: first row, first column)print(sheet.cell_value(0, 0)) # Get the value of the second row and third columnprint(sheet.cell_value(1, 2)) # Get the cell object in the second row and third columnprint((1, 2).value) print((1)[2].value)
Read cell type
Can get the cell type
# Get the type of the first row and first column cellcell_type = sheet.cell_type(0, 0) # 0: NUMERIC, 1: STRING, 2: BLANK, 3: BOOLEAN, 4: ERROR print(f"Cell Type: {cell_type}")
Iterate through all cells
Iterate through all cells of the entire worksheet and print the content
# Iterate through each row and column of an Excel table to get the value of each cellfor row in range(): # Return the total number of rows in the table for col in range(): # Return the total number of columns in the table # Get the cell value of the current position (row, col) cell_value = sheet.cell_value(row, col) # Print the cell position and value print(f"({row}, {col}) Value of: {cell_value}")
Read cells in a specific range
If you want to read only cells of a specific range, you can use the following method
# Read the cells from row 1 to row 3, column 1 to column 2for row in range(1, 4): for col in range(1, 3): cell_value = sheet.cell_value(row, col) print(f"({row}, {col}) Value of: {cell_value}")
Read cells of different data types
xlrd supports a variety of data types, including numbers, strings, booleans, and errors. Here is an example of how to read different types of cells:
# Read a specific cell and determine the typecell_value = sheet.cell_value(1, 1) # Read the second row and second columncell_type = sheet.cell_type(1, 1) if cell_type == 0: # NUMERIC print(f"number: {cell_value}") elif cell_type == 1: # STRING print(f"String: {cell_value}") elif cell_type == 2: # BLANK print("Empty Cell") elif cell_type == 3: # BOOLEAN print(f"Boolean value: {cell_value}") elif cell_type == 4: # ERROR print("Error Cell")
Row and column operations
Get the entire row or column of data
You can get data for whole rows or columns
# Get the entire linerow_values = sheet.row_values(0) # Line 1print(f"The value of the first line: {row_values}") # Get the entire columncol_values = sheet.col_values(0) # First columnprint(f"The value of the first column: {col_values}")
Read all rows in dictionary format
Read each line in a dictionary using the following method
# Assume the first behavior headerheader = sheet.row_values(0) # Initialize a list to store data for all rowsdata = [] # Use a loop to traverse all rows except the headerfor row in range(1, ): # Initialize a dictionary to store the data of the current row row_data = {} # Use a loop to traverse all columns for col in range(): # Add the value of the cell to the dictionary of the current row, using the header as the key row_data[header[col]] = sheet.cell_value(row, col) # Add the dictionary of the current row to the data list (row_data) # Print the final data listprint(data)
Read non-null values for specific columns
# Initialize the index value of the column to 0, indicating the first columncol_index = 0 # Create an empty list to store non-null values in the first columnnon_empty_values = [] # Iterate through each row of the Excel table to get the value of the first columnfor row in range(): # Get the cell value of the specified row and column value = sheet.cell_value(row, col_index) # If the value is not an empty string, add it to the list if value != '': non_empty_values.append(value) # Print non-null values in the first columnprint(f"Non-null value of the first column: {non_empty_values}")
Other operations
Processing date type
If the cell contains a date, xlrd stores it as a floating point number. Available.xldate_as_tuple()
Method converts it to a date tuple:
import # Assume that the third row and the first column are datesdate_value = sheet.cell_value(2, 0) date_tuple = .xldate_as_tuple(date_value, ) print(f"date: {date_tuple}") # The output format is (year, month, day, hour, minute, second)
Handle multiple date formats
Sometimes the date format in Excel may be different. A function can be created to handle multiple date formats
def parse_date(value): """ Resolve date value。 Convert a floating point number representing a date into a readable date tuple based on the type of value。 parameter: value (float): A floating point number representing a date,Usually read from spreadsheet software。 return: tuple or None: If the input value is a floating point number,则return一个包含年、moon、day、hour、point、Tuples of seconds; 否则return None。 """ if isinstance(value, float): # The date is usually a floating point number return .xldate_as_tuple(value, ) return None # traverse every row of the tablefor row in range(): # Assuming the first column is a date, get the date value of the row date_value = sheet.cell_value(row, 0) # Try to parse the date value parsed_date = parse_date(date_value) # If parsing is successful, print the date information of the line if parsed_date: print(f"OK {row} 的day期: {parsed_date}")
Process empty cells
You can check whether the cell is empty and process it accordingly
for row in range(): for col in range(): cell_value = sheet.cell_value(row, col) if cell_value == '': print(f"({row}, {col}) It's an empty cell") else: print(f"({row}, {col}) Value of: {cell_value}")
Handle error cells
Can check if the cell is of the wrong type
# traverse each cell in the Excel table and find cells of the wrong typefor row in range(): for col in range(): # Determine whether the current cell type is the wrong type if sheet.cell_type(row, col) == 4: # Error Type print(f"({row}, {col}) Yes the error cell")
Performance optimization when reading large files
When dealing with very large Excel files, consider reading only necessary sheets or rows to reduce memory usage. Can use xlrdopen_workbook
The method ofon_demand
Parameters:
# Load worksheets only when neededworkbook = xlrd.open_workbook('', on_demand=True) # Loading when accessing the worksheetsheet = workbook.sheet_by_index(0)
Use xlrd and pandas combination
If more powerful data processing power is required, xlrd can be used in conjunction with pandas. First read the data with xlrd and then convert it to DataFrame
import pandas as pd import xlrd # Open the fileworkbook = xlrd.open_workbook('') # Get the first worksheet through indexsheet = workbook.sheet_by_index(0) # Convert data to DataFrame# Iterate through each row in the worksheet and convert the data into a listdata = [] for row in range(): (sheet.row_values(row)) # Create a DataFrame, take the first row as the column name, and the rest of the rows as data# data[1:] as data, data[0] as column name: the first behavior headerdf = (data[1:], columns=data[0]) # Print DataFrameprint(df)
Custom data processing
You can customize processing logic when reading cells, such as formatting numbers into currencies
def format_currency(value): """ Format the given value into currency format。 parameter: value (int, float): Values that need to be formatted。 return: str: Formatted strings,If the input is not an integer or a floating point number,则return原值。 """ # Format as currency, preceded by the dollar sign, and retain two decimal places return f"${value:,.2f}" if isinstance(value, (int, float)) else value # traverse every row and column of the tablefor row in range(): for col in range(): # Get the value of the current cell cell_value = sheet.cell_value(row, col) # Format the value of the cell into currency formatted_value = format_currency(cell_value) # Print formatted values print(f"({row}, {col}) Formatted values of: {formatted_value}")
Extract specific data using regular expressions
If you need to extract data in a specific format from a cell, you can use regular expressions
import re # Define a regular expression pattern to match strings containing the keyword "Java"pattern = r'Java' # Suppose in an Excel worksheet, iterate through all lines to find strings that match a specific pattern.for row in range(): # Get the cell value of the second column of the current row cell_value = sheet.cell_value(row, 1) # Search for strings that match predefined patterns in cell values matches = (pattern, cell_value) # If a match is found, output the line number and match if matches: print(f"OK {row} Matches found in: {matches}")
Processing conditional format
Although xlrd does not support reading conditional formatting, it can be manually handled according to business rules
# traverse all rows of the Excel worksheet and check whether the value of the fourth column exceeds 2800for row in range(): # Get the cell value of the fourth column of the current row cell_value = sheet.cell_value(row, 3) # Check whether the cell value is a number if isinstance(cell_value, (int, float)): # Process according to conditions if cell_value > 2800: print(f"OK {row} The value exceeds the limit: {cell_value}")
Custom class encapsulation read logic
Read logic can be encapsulated in a class for easy reuse and extension
import xlrd class ExcelReader: """ Class used to read Excel files. Attributes: workbook: Open Excel workbook. """ def __init__(self, file_path): """ Initialize the ExcelReader instance. Parameters: file_path: str The path to the Excel file. """ = xlrd.open_workbook(file_path) def get_sheet(self, index): """ Get the worksheet based on the index. Parameters: index: int The index of the worksheet. Return: sheet: The worksheet for the specified index. """ return .sheet_by_index(index) def get_row_values(self, sheet, row_index): """ Gets all values of a row. parameter: sheet : Worksheet. row_index : int row index. return: row_values : list All values of the specified row. """ return sheet.row_values(row_index) # Read Excel files using custom classesreader = ExcelReader('') # Get the first worksheetsheet = reader.get_sheet(0) # Get all values in the first rowrow_values = reader.get_row_values(sheet, 1) print(row_values)
The above is the detailed content of the sample code for Python to easily read Excel files using xlrd. For more information about Python xlrd reading Excel, please follow my other related articles!