Make data analysis more efficient! Use Pandas to directly read and write SQLite3 data, saying goodbye to manually splicing SQL statements!
1 Environmental preparation
Make sure it is installedpandas
andsqlite3
(The former needs to be installed separately, while the latter is built in Python):
pip install pandas
2 Read data from SQLite3 to DataFrame
Basic usage: Read the entire table
import pandas as pd import sqlite3 # Connect to the databaseconn = ('') # Read users table to DataFramedf = pd.read_sql('SELECT * FROM users', conn) print(()) # View the first 5 rows of data # Close the connection()
Advanced Usage: Filtering and Aggregation
query = ''' SELECT name, AVG(age) as avg_age -- Calculate the average age FROM users WHERE age > 20 GROUP BY name ''' df = pd.read_sql(query, conn) print(df)
3 Write DataFrame to SQLite3
Basic write (full coverage)
# Create a sample DataFramedata = { 'name': ['David', 'Eve'], 'age': [28, 32], 'email': ['david@', 'eve@'] } df = (data) # Write to users table (full coverage)df.to_sql( name='users', # Table name con=conn, # Database connection if_exists='replace', # If the table exists, replace it directly (use with caution!) index=False # Do not save the index column of the DataFrame) ()
Append data (incremental write)
df.to_sql( name='users', con=conn, if_exists='append', # Append to existing table index=False ) ()
4 Practical scenarios: Data cleaning + storage
Suppose there is a CSV filedirty_data.csv
, need to be cleaned and stored in SQLite3:
id,name,age,email 1, Alice,30,alice@ 2, Bob , invalid, bob@ # Wrong age3, Charlie,35,missing_email
Step 1: Clean the data with Pandas
# Read CSVdf = pd.read_csv('dirty_data.csv') # Cleaning operationdf['age'] = pd.to_numeric(df['age'], errors='coerce') # Invalid age converted to NaNdf = (subset=['age']) # Delete rows with invalid agedf['email'] = df['email'].fillna('unknown') # Fill in missing mailboxdf['name'] = df['name'].() # Remove spaces before and after the name print(df)
Step 2: Write to the database
with ('') as conn: # Write to new table cleaned_users df.to_sql('cleaned_users', conn, index=False, if_exists='replace') # Verify the write result df_check = pd.read_sql('SELECT * FROM cleaned_users', conn) print(df_check)
5 Performance optimization: Write big data in blocks
When processing super large data (such as 100,000 rows), avoid loading into memory at one time:
# Read CSV in chunks (10,000 rows per read)chunk_iter = pd.read_csv('big_data.csv', chunksize=1000) with ('big_db.db') as conn: for chunk in chunk_iter: # Make simple processing of each block chunk['timestamp'] = pd.to_datetime(chunk['timestamp']) # Write to the database in chunks chunk.to_sql( name='big_table', con=conn, if_exists='append', #Add mode index=False ) print("Writing all is done!")
6 Advanced Tips: Perform SQL Operations Directly
Although Pandas is powerful, complex queries still require direct operation in SQL:
# Create a temporary DataFramedf = ({'product': ['A', 'B', 'C'], 'price': [10, 200, 150]}) # Write to the products tabledf.to_sql('products', conn, index=False, if_exists='replace') # Execute complex queries (connect users and orders tables)query = ''' SELECT , , FROM users u JOIN orders o ON = o.user_id JOIN products p ON o.product_id = WHERE > 10 ''' result_df = pd.read_sql(query, conn) print(result_df)
7 Pit avoidance guide
Data type matching problem:
- SQLite defaults to all columns as
TEXT
, but Pandas automatically infers the type. - Available when writing
dtype
Parameters manually specify the type:df.to_sql('table', conn, dtype={'age': 'INTEGER', 'price': 'REAL'})
-
Primary key and index:
- Pandas does not automatically create primary keys or indexes, and the table structure needs to be defined in advance using SQL statements.
-
Performance bottleneck:
- When writing large amounts of data, closing the transaction automatic commit can speed up:
with conn: df.to_sql(...) # Automatic submission using context manager
- When writing large amounts of data, closing the transaction automatic commit can speed up:
8 Summary
With the combination of Pandas + SQLite3, you can:
✅ Quickly import/export data: Say goodbye to manual splicing SQL statements.
✅ Seamless data analysis: Clean, calculate, and visualize it and enter the warehouse directly.
✅ Process massive data: Block reading and writing to avoid memory explosion.
Next step suggestions:
- Try to automatically synchronize Excel/CSV files to SQLite3 database.
- Learn to use
sqlalchemy
Library enhances SQL operation capabilities.
This is the end of this article about Pandas' actual combat using SQLite3. For more related content about Pandas' use of SQLite3, please search for my previous articles or continue browsing the related articles below. I hope everyone will support me in the future!