Easily get Excel data extraction with Pandas: Python skills that novices can learn
Preparation: Install the necessary libraries
First, we need to make sure that the pandas library is installed on the computer. If you haven't installed it, just type in the command line:
pip install pandas openpyxl
hereopenpyxl
It is a dependent library used to process Excel files.
Code analysis: look at one line by one
Let's take a closer look at the code in the picture, and I will explain the function of each line in the most common way:
# Import pandas libraryimport pandas as pd
This line of code is like saying to Python: "Hey, I'm going to use the pandas toolkit, prepare it for me!" pandas is a powerful tool for processing table data in Python.
# Compact displaypd.set_option('display.max_rows', 10)
This line of code sets the display option, which means "When there is too much data, only 10 lines will be displayed at most, and the others will be represented by ellipsis". In this way, our output will not flood the console and will look refreshing.
# Read file datadf = pd.read_excel('', index_col='Student number')
This is the most critical step: •pd.read_excel
It is a function that reads Excel files •''
It's our file name •index_col='Student number'
Indicates that the column "student number" is used as the identification (index) of each row
Imagine that it is like setting the "student number" column as the first column in Excel, so that we can search.
name = df['Name']
This line of code extracts the "name" column from the data, which is equivalent to selecting the entire name column in Excel.
# Output name columnprint(name)
Finally, we print out the name list and we can see a neat list of students!
Actual effect
After running this code, you will see an output like this:
Student ID 2301 Zhang San 2302 Li Si 2303 Wang Wu ... 2418 Zhao Liu Name: Name, dtype: object
The "..." in the middle means that some lines are omitted, which is exactly the compact display effect we set.
FAQ
Q: Why is my code reported an error?A: Possible reasons are:
- The file is not in the current directory (Solution: Use the full file path)
- File name typo (check case and extension)
- The column names do not match (confirm that there are indeed "student number" and "name" columns in Excel)
Q: How to display more/less rows?A: Modifydisplay.max_rows
The value of , for example, set to 5 will only display 5 lines.
Q: Can multiple columns be displayed at the same time?A: Of course! For example, to display the student number and name:
print(df[['Student number', 'Name']])
Tips sharing
- Run this code in Jupyter Notebook for more intuitive results
- Can be used
()
Quickly view the first few lines of data - Add to
encoding='utf-8'
Parameters can avoid the problem of garbled Chinese
Summarize
With this simple example, we learned:
- Read Excel files with pandas
- Set up a compact display
- Extract specific column data
- Output the sorted results
This skill is like a "Swiss Army Knife" for data processing, simple but super practical. With it mastered, you can quickly extract the required information from various forms without having to manually copy and paste it!
The above is the detailed content of how Python uses the Pandas library to extract data from Excel files. For more information about Python Pandas extracting Excel data, please follow my other related articles!