Pandas data reading and export
Pandas is a powerful Python library for data processing and analysis. It provides many functions to read and import data, and supports multiple file formats such as CSV, Excel, SQL database, JSON, etc.
The following are some commonly used data reading and export methods:
Common methods
Format | File format | Read functions | Write (export) function |
---|---|---|---|
binary | Excel | read_excel | to_excel |
text | CSV | read_csv read_table | to_csv |
text | JSON | read_json | to_json |
text | Web table HTML | read_html | to_html |
text | Cutting board | read_clipboard | to_clipboard |
SQ!L | SQL | read_sql | to_sql |
XML | read_xml | read_xml | |
text | Markdown | to_markdown |
in:
- Reading functions will generally be assigned to a variable df, df = pd.read_()
- The output function is to operate the variable itself and output df.to_()
Common Function Methods
Excel Objects
~ Object ~ Object ~ Object的属性和方法
Read data
~ pd.read_csv() ~ pd.read_excel() ~ pd.json_normalize() ~ pd.read_pickle() ~ pd.read_table() ~ .from_dict() Created from a dictionary DataFrame ~ pd.read_clipboard() Read data from the clipboard ~ pd.read_json() Read JSON ~ pd.read_sql() Read数据库数据 ~ pd.read_fwf() Read固定宽度格式document ~ pd.read_html() from HTML Document extraction table data ~ pd.read_parquet() Read Parquet document
Export data
~ to_csv() Export as CSVdocument ~ to_excel() Export as Excel document ~ to_dict() Output dictionary ~ to_pickle() Serialized to pickle document ~ to_json() Convert to JSON Format string ~ to_html() Convert to HTML Table format ~ to_sql() Write to relational database ~ to_parquet() Save as parquet document
Data reading
- Read CSV files
import pandas as pd df = pd.read_csv('') # Parameters can be used to adjust read behavior,Such as separator、Missing value mark、Listing, etc.
- Read Excel files
df = pd.read_excel('', sheet_name='Sheet1') # You can specify a worksheet name or index,Or use sheet_name=None To read all worksheets
- Read from SQL database
import sqlite3 # or other database connection library conn = ('') df = pd.read_sql_query('SELECT * FROM table_name', conn) () # For other databases,like MySQL、PostgreSQL,You need to use the corresponding connection library and driver
- Read JSON files
df = pd.read_json('') # Available orient Parameters to specify JSON Data layout
- Read HTML tables
df = pd.read_html('/page_with_table.html')[0] # read_html Return one DataFrame List,Usually indexing is used [0] Get the first form
- Read from the clipboard
df = pd.read_clipboard() # This is for Excel Copying data in applications such as this is particularly useful
Data Export
- Export to CSV file
df.to_csv('output_file.csv', index=False) # index=False It means not exported DataFrame Index of
- Export to Excel file
df.to_excel('output_file.xlsx', sheet_name='Sheet1', index=False) # You can specify worksheet names and other options,Like engine(For newer Pandas Version,The default engine is 'openpyxl')
- Export to SQL database
conn = ('') df.to_sql('table_name', conn, if_exists='replace', index=False) () # if_exists The parameters can be 'fail'(If the table exists, an error is raised)、'replace'(Replace table)、'append'(Add data in the table)
- Export to JSON file
df.to_json('output_file.json', orient='records', lines=True) # orient Parameters can be specified JSON Data layout,lines=True Indicates that each line is one JSON Object
- Export to HTML file
with open('output_file.html', 'w') as f: (df.to_html()) # Can also be used pandas Provided to_html() Method generation HTML String,Then save to file
- Multiple worksheets exported to Excel
with ('output_file_with_sheets.xlsx') as writer: df1.to_excel(writer, sheet_name='Sheet1', index=False) df2.to_excel(writer, sheet_name='Sheet2', index=False) # use ExcelWriter Context Manager makes it easy to write multiple worksheets
Notes:
- File path: Ensure that the file path is correct and that the program has appropriate read and write permissions.
- Data Type: When exporting, pay attention to the compatibility of data types, especially when the data contains special characters or date-time types.
- Dependencies: Some export methods (such as to SQL databases) may require additional libraries and database drivers.
- performance: For large datasets, exporting to certain formats (such as Excel) can be slow and may be memory-bound. In this case, consider exporting the data in batches or using a format that is more suitable for large data sets (such as CSV).
Summarize
The above is personal experience. I hope you can give you a reference and I hope you can support me more.