SoFunction
Updated on 2025-04-14

Pandas uses SQLite3 to fight

Make data analysis more efficient! Use Pandas to directly read and write SQLite3 data, saying goodbye to manually splicing SQL statements!

1 Environmental preparation

Make sure it is installedpandasandsqlite3(The former needs to be installed separately, while the latter is built in Python):

pip install pandas

2 Read data from SQLite3 to DataFrame

Basic usage: Read the entire table

import pandas as pd
import sqlite3

# Connect to the databaseconn = ('')

# Read users table to DataFramedf = pd.read_sql('SELECT * FROM users', conn)
print(())  # View the first 5 rows of data
# Close the connection()

Advanced Usage: Filtering and Aggregation

query = '''
     SELECT
         name,
         AVG(age) as avg_age -- Calculate the average age
     FROM users
     WHERE age > 20
     GROUP BY name
 '''
df = pd.read_sql(query, conn)
print(df)

3 Write DataFrame to SQLite3

Basic write (full coverage)

# Create a sample DataFramedata = {
    'name': ['David', 'Eve'],
    'age': [28, 32],
    'email': ['david@', 'eve@']
}
df = (data)

# Write to users table (full coverage)df.to_sql(
    name='users',     # Table name    con=conn,         # Database connection    if_exists='replace',  # If the table exists, replace it directly (use with caution!)    index=False       # Do not save the index column of the DataFrame)
()

Append data (incremental write)

df.to_sql(
    name='users',
    con=conn,
    if_exists='append',  # Append to existing table    index=False
)
()

4 Practical scenarios: Data cleaning + storage

Suppose there is a CSV filedirty_data.csv, need to be cleaned and stored in SQLite3:

id,name,age,email
1, Alice,30,alice@
2, Bob , invalid, bob@  # Wrong age3, Charlie,35,missing_email

Step 1: Clean the data with Pandas

# Read CSVdf = pd.read_csv('dirty_data.csv')

# Cleaning operationdf['age'] = pd.to_numeric(df['age'], errors='coerce')  # Invalid age converted to NaNdf = (subset=['age'])                        # Delete rows with invalid agedf['email'] = df['email'].fillna('unknown')            # Fill in missing mailboxdf['name'] = df['name'].()                   # Remove spaces before and after the name
print(df)

Step 2: Write to the database

with ('') as conn:
    # Write to new table cleaned_users    df.to_sql('cleaned_users', conn, index=False, if_exists='replace')
    
    # Verify the write result    df_check = pd.read_sql('SELECT * FROM cleaned_users', conn)
    print(df_check)

5 Performance optimization: Write big data in blocks

When processing super large data (such as 100,000 rows), avoid loading into memory at one time:

# Read CSV in chunks (10,000 rows per read)chunk_iter = pd.read_csv('big_data.csv', chunksize=1000)

with ('big_db.db') as conn:
    for chunk in chunk_iter:
        # Make simple processing of each block        chunk['timestamp'] = pd.to_datetime(chunk['timestamp'])
        # Write to the database in chunks        chunk.to_sql(
            name='big_table',
            con=conn,
            if_exists='append',  #Add mode            index=False
        )
    print("Writing all is done!")

6 Advanced Tips: Perform SQL Operations Directly

Although Pandas is powerful, complex queries still require direct operation in SQL:

# Create a temporary DataFramedf = ({'product': ['A', 'B', 'C'], 'price': [10, 200, 150]})

# Write to the products tabledf.to_sql('products', conn, index=False, if_exists='replace')

# Execute complex queries (connect users and orders tables)query = '''
    SELECT 
        ,
        ,
        
    FROM users u
    JOIN orders o ON  = o.user_id
    JOIN products p ON o.product_id = 
    WHERE  > 10
'''
result_df = pd.read_sql(query, conn)
print(result_df)

7 Pit avoidance guide

Data type matching problem

  • SQLite defaults to all columns asTEXT, but Pandas automatically infers the type.
  • Available when writingdtypeParameters manually specify the type:
    df.to_sql('table', conn, dtype={'age': 'INTEGER', 'price': 'REAL'})
    
  • Primary key and index

    • Pandas does not automatically create primary keys or indexes, and the table structure needs to be defined in advance using SQL statements.
  • Performance bottleneck

    • When writing large amounts of data, closing the transaction automatic commit can speed up:
      with conn:
          df.to_sql(...)  # Automatic submission using context manager

8 Summary

With the combination of Pandas + SQLite3, you can:
✅ Quickly import/export data: Say goodbye to manual splicing SQL statements.
✅ Seamless data analysis: Clean, calculate, and visualize it and enter the warehouse directly.
✅ Process massive data: Block reading and writing to avoid memory explosion.

Next step suggestions

  • Try to automatically synchronize Excel/CSV files to SQLite3 database.
  • Learn to usesqlalchemyLibrary enhances SQL operation capabilities.

This is the end of this article about Pandas' actual combat using SQLite3. For more related content about Pandas' use of SQLite3, please search for my previous articles or continue browsing the related articles below. I hope everyone will support me in the future!