preamble
In Pandas, thedescribe()
Functions can provide statistical summary information for the data frame (DataFrame) in the numerical columns.
I. describe () function is what?
describe()
function is a method of the Pandas DataFrame object that generates a statistical summary of the data in question. This statistical summary includes the number of data columns, mean, standard deviation, minimum, 25% quartile, median (50% quartile), 75% quartile, and maximum.
Second, the basic use of describe () function
import pandas as pd # Create a sample data frame data = {'Age': [25, 30, 35, 40, 45], 'Salary': [50000, 60000, 75000, 80000, 90000]} df = (data) # Use the describe() function to generate summary statistics summary = () print(summary)
Age Salary
count 5.000000 5.000000
mean 35.000000 71000.000000
std 7.905694 15968.719423
min 25.000000 50000.000000
25% 30.000000 60000.000000
50% 35.000000 75000.000000
75% 40.000000 80000.000000
max 45.000000 90000.000000
III. Description of the parameters of the describe() function
describe()
The function has some optional parameters that can be used to control the output of the statistical summary.
-
percentiles
: Specify the percentile to be calculated, the default is[0.25, 0.5, 0.75]
The first two quartiles are the 25%, 50% and 75% quartiles. -
include
cap (a poem)exclude
: Used to select the type of data to include or exclude, by default all numeric columns are included. -
datetime_is_numeric
: If True, the date-time column is treated as a numeric column for statistical purposes.
IV. Sample code
1. Use of the percentiles parameter
set uppercentiles
The parameter customizes the percentile to be calculated. Example:
custom_percentiles = [0.1, 0.5, 0.9] custom_summary = (percentiles=custom_percentiles) print(custom_summary)
2. Utilizationinclude
cap (a poem)exclude
parameters
utilizationinclude
cap (a poem)exclude
parameter to select the type of data to include or exclude.
integer_summary = (include='int') print(integer_summary)
3. Processing of date-time columns
If the data frame contains a datetime column, use thedatetime_is_numeric
parameter treats it as a numeric column for statistical purposes. Example:
import datetime dates = [(2022, 1, 1), (2022, 1, 2), (2022, 1, 3)] df['Date'] = dates date_summary = (datetime_is_numeric=True) print(date_summary)
to this article on the specific use of the Pandas Describe function is introduced to this article, more related Pandas describe () content, please search for my previous articles or continue to browse the following related articles I hope that you will support me in the future!