Sample code for Python to easily read Excel files using xlrd

xlrd

Overview

xlrd is a Python library for reading Excel files. It helps users quickly extract data and analyze it. It is suitable for reading data from older Excel files (.xls format). For newer Excel files, other libraries such as openpyxl or pandas are recommended.

document:/en/latest/

Install

First, you need to make sure that xlrd is installed

pip install xlrd==1.2.0

Note: The new version of xlrd only supports xls format, so here specifies that the 1.2.0 version can be installed, and the xlsx format can be supported.

Read Excel file

Open an Excel file using xlrd

import xlrd

# Open Excel fileworkbook = xlrd.open_workbook('')

sheet operation

Get sheet worksheet

Worksheets can be obtained by index or name

# 1. Find through indexsheet = workbook.sheet_by_index(0)

# 2. Search by sheet name:sheet = workbook.sheet_by_name('Sheet1')

Get the number of rows and columns of the sheet

Get the number of rows and columns of the worksheet

# Get the number of rows and columnsnum_rows = 
num_cols = 
print(f"Number of rows: {num_rows}, Number of columns: {num_cols}")

Traversal sheet

# Get sheet countprint()
# traverse all worksheets in Excel workbooksfor i in range(0, ):
    # Get the current worksheet based on the index    sheet = workbook.sheet_by_index(i)
    # Print the value of the cell in the upper left corner of the current worksheet. The value of 1 row and 1 column.    print(sheet.cell_value(0, 0))

# Get all sheet namesprint(workbook.sheet_names())
# traverse all worksheets in the workbookfor i in workbook.sheet_names():
    # Get worksheet object based on worksheet name    sheet = workbook.sheet_by_name(i)
    # Print the cell value of the first row and first column of the worksheet    print(sheet.cell_value(0, 0))

Cell operations

Read cell content

The value of a cell can be read by specifying the index of rows and columns.

# Read a specific cell (for example: first row, first column)print(sheet.cell_value(0, 0))
# Get the value of the second row and third columnprint(sheet.cell_value(1, 2))

# Get the cell object in the second row and third columnprint((1, 2).value)
print((1)[2].value)

Read cell type

Can get the cell type

# Get the type of the first row and first column cellcell_type = sheet.cell_type(0, 0)
# 0: NUMERIC, 1: STRING, 2: BLANK, 3: BOOLEAN, 4: ERROR
print(f"Cell Type: {cell_type}")

Iterate through all cells

Iterate through all cells of the entire worksheet and print the content

# Iterate through each row and column of an Excel table to get the value of each cellfor row in range():  # Return the total number of rows in the table    for col in range():  # Return the total number of columns in the table        # Get the cell value of the current position (row, col)        cell_value = sheet.cell_value(row, col) 
        # Print the cell position and value        print(f"({row}, {col}) Value of: {cell_value}")

Read cells in a specific range

If you want to read only cells of a specific range, you can use the following method

# Read the cells from row 1 to row 3, column 1 to column 2for row in range(1, 4):
    for col in range(1, 3):
        cell_value = sheet.cell_value(row, col)
        print(f"({row}, {col}) Value of: {cell_value}")

Read cells of different data types

xlrd supports a variety of data types, including numbers, strings, booleans, and errors. Here is an example of how to read different types of cells:

# Read a specific cell and determine the typecell_value = sheet.cell_value(1, 1)  # Read the second row and second columncell_type = sheet.cell_type(1, 1)

if cell_type == 0:  # NUMERIC
    print(f"number: {cell_value}")
elif cell_type == 1:  # STRING
    print(f"String: {cell_value}")
elif cell_type == 2:  # BLANK
    print("Empty Cell")
elif cell_type == 3:  # BOOLEAN
    print(f"Boolean value: {cell_value}")
elif cell_type == 4:  # ERROR
    print("Error Cell")

Row and column operations

Get the entire row or column of data

You can get data for whole rows or columns

# Get the entire linerow_values = sheet.row_values(0)  # Line 1print(f"The value of the first line: {row_values}")

# Get the entire columncol_values = sheet.col_values(0)  # First columnprint(f"The value of the first column: {col_values}")

Read all rows in dictionary format

Read each line in a dictionary using the following method

# Assume the first behavior headerheader = sheet.row_values(0)

# Initialize a list to store data for all rowsdata = []

# Use a loop to traverse all rows except the headerfor row in range(1, ):
    # Initialize a dictionary to store the data of the current row    row_data = {}
    # Use a loop to traverse all columns    for col in range():
        # Add the value of the cell to the dictionary of the current row, using the header as the key        row_data[header[col]] = sheet.cell_value(row, col)
    # Add the dictionary of the current row to the data list    (row_data)

# Print the final data listprint(data)

Read non-null values for specific columns

# Initialize the index value of the column to 0, indicating the first columncol_index = 0
# Create an empty list to store non-null values in the first columnnon_empty_values = []

# Iterate through each row of the Excel table to get the value of the first columnfor row in range():
    # Get the cell value of the specified row and column    value = sheet.cell_value(row, col_index)
    # If the value is not an empty string, add it to the list    if value != '':
        non_empty_values.append(value)
# Print non-null values in the first columnprint(f"Non-null value of the first column: {non_empty_values}")

Other operations

Processing date type

If the cell contains a date, xlrd stores it as a floating point number. Available.xldate_as_tuple() Method converts it to a date tuple:

import 

# Assume that the third row and the first column are datesdate_value = sheet.cell_value(2, 0)
date_tuple = .xldate_as_tuple(date_value, )
print(f"date: {date_tuple}")  # The output format is (year, month, day, hour, minute, second)

Handle multiple date formats

Sometimes the date format in Excel may be different. A function can be created to handle multiple date formats

def parse_date(value):
    """
    Resolve date value。

    Convert a floating point number representing a date into a readable date tuple based on the type of value。

    parameter:
        value (float): A floating point number representing a date，Usually read from spreadsheet software。

    return:
        tuple or None: If the input value is a floating point number，则return一个包含年、moon、day、hour、point、Tuples of seconds；
                       否则return None。
    """
    if isinstance(value, float):  # The date is usually a floating point number        return .xldate_as_tuple(value, )
    return None

# traverse every row of the tablefor row in range():
    # Assuming the first column is a date, get the date value of the row    date_value = sheet.cell_value(row, 0)
    # Try to parse the date value    parsed_date = parse_date(date_value)
    # If parsing is successful, print the date information of the line    if parsed_date:
        print(f"OK {row} 的day期: {parsed_date}")

Process empty cells

You can check whether the cell is empty and process it accordingly

for row in range():
    for col in range():
        cell_value = sheet.cell_value(row, col)
        if cell_value == '':
            print(f"({row}, {col}) It's an empty cell")
        else:
            print(f"({row}, {col}) Value of: {cell_value}")

Handle error cells

Can check if the cell is of the wrong type

# traverse each cell in the Excel table and find cells of the wrong typefor row in range():
    for col in range():
        # Determine whether the current cell type is the wrong type        if sheet.cell_type(row, col) == 4:  # Error Type            print(f"({row}, {col}) Yes the error cell")

Performance optimization when reading large files

When dealing with very large Excel files, consider reading only necessary sheets or rows to reduce memory usage. Can use xlrdopen_workbook The method ofon_demandParameters:

# Load worksheets only when neededworkbook = xlrd.open_workbook('', on_demand=True)

# Loading when accessing the worksheetsheet = workbook.sheet_by_index(0)

Use xlrd and pandas combination

If more powerful data processing power is required, xlrd can be used in conjunction with pandas. First read the data with xlrd and then convert it to DataFrame

import pandas as pd
import xlrd

# Open the fileworkbook = xlrd.open_workbook('')

# Get the first worksheet through indexsheet = workbook.sheet_by_index(0)

# Convert data to DataFrame# Iterate through each row in the worksheet and convert the data into a listdata = []
for row in range():
    (sheet.row_values(row))

# Create a DataFrame, take the first row as the column name, and the rest of the rows as data# data[1:] as data, data[0] as column name: the first behavior headerdf = (data[1:], columns=data[0])
# Print DataFrameprint(df)

Custom data processing

You can customize processing logic when reading cells, such as formatting numbers into currencies

def format_currency(value):
    """
    Format the given value into currency format。

    parameter:
    value (int, float): Values that need to be formatted。

    return:
    str: Formatted strings，If the input is not an integer or a floating point number，则return原值。
    """
    # Format as currency, preceded by the dollar sign, and retain two decimal places    return f"${value:,.2f}" if isinstance(value, (int, float)) else value

# traverse every row and column of the tablefor row in range():
    for col in range():
        # Get the value of the current cell        cell_value = sheet.cell_value(row, col)
        # Format the value of the cell into currency        formatted_value = format_currency(cell_value)
        # Print formatted values        print(f"({row}, {col}) Formatted values of: {formatted_value}")

Extract specific data using regular expressions

If you need to extract data in a specific format from a cell, you can use regular expressions

import re

# Define a regular expression pattern to match strings containing the keyword "Java"pattern = r'Java'

# Suppose in an Excel worksheet, iterate through all lines to find strings that match a specific pattern.for row in range():
    # Get the cell value of the second column of the current row    cell_value = sheet.cell_value(row, 1)
    # Search for strings that match predefined patterns in cell values    matches = (pattern, cell_value)
    # If a match is found, output the line number and match    if matches:
        print(f"OK {row} Matches found in: {matches}")

Processing conditional format

Although xlrd does not support reading conditional formatting, it can be manually handled according to business rules

# traverse all rows of the Excel worksheet and check whether the value of the fourth column exceeds 2800for row in range():
    # Get the cell value of the fourth column of the current row    cell_value = sheet.cell_value(row, 3)

    # Check whether the cell value is a number    if isinstance(cell_value, (int, float)):
        # Process according to conditions        if cell_value &gt; 2800:
            print(f"OK {row} The value exceeds the limit: {cell_value}")

Custom class encapsulation read logic

Read logic can be encapsulated in a class for easy reuse and extension

import xlrd

class ExcelReader:
    """
     Class used to read Excel files.

     Attributes:
         workbook:
             Open Excel workbook.
     """
    def __init__(self, file_path):
        """
         Initialize the ExcelReader instance.

         Parameters: file_path: str The path to the Excel file.
         """
         = xlrd.open_workbook(file_path)

    def get_sheet(self, index):
        """
         Get the worksheet based on the index.

         Parameters: index: int The index of the worksheet.

         Return: sheet: The worksheet for the specified index.
         """
        return .sheet_by_index(index)

    def get_row_values(self, sheet, row_index):
        """
         Gets all values of a row.

         parameter:
             sheet : Worksheet.
             row_index : int row index.

         return:
             row_values : list All values of the specified row.
         """
        return sheet.row_values(row_index)

# Read Excel files using custom classesreader = ExcelReader('')

# Get the first worksheetsheet = reader.get_sheet(0)
# Get all values in the first rowrow_values = reader.get_row_values(sheet, 1)
print(row_values)

The above is the detailed content of the sample code for Python to easily read Excel files using xlrd. For more information about Python xlrd reading Excel, please follow my other related articles!