SoFunction
Updated on 2024-10-29

Specific use of the Pandas Describe function

preamble

In Pandas, thedescribe() Functions can provide statistical summary information for the data frame (DataFrame) in the numerical columns.

I. describe () function is what?

describe() function is a method of the Pandas DataFrame object that generates a statistical summary of the data in question. This statistical summary includes the number of data columns, mean, standard deviation, minimum, 25% quartile, median (50% quartile), 75% quartile, and maximum.

Second, the basic use of describe () function

import pandas as pd

# Create a sample data frame
data = {'Age': [25, 30, 35, 40, 45],
        'Salary': [50000, 60000, 75000, 80000, 90000]}

df = (data)

# Use the describe() function to generate summary statistics
summary = ()

print(summary)

             Age        Salary
count   5.000000      5.000000
mean   35.000000  71000.000000
std     7.905694  15968.719423
min    25.000000  50000.000000
25%    30.000000  60000.000000
50%    35.000000  75000.000000
75%    40.000000  80000.000000
max    45.000000  90000.000000

III. Description of the parameters of the describe() function

describe() The function has some optional parameters that can be used to control the output of the statistical summary.

  • percentiles: Specify the percentile to be calculated, the default is[0.25, 0.5, 0.75]The first two quartiles are the 25%, 50% and 75% quartiles.
  • include cap (a poem)exclude: Used to select the type of data to include or exclude, by default all numeric columns are included.
  • datetime_is_numeric: If True, the date-time column is treated as a numeric column for statistical purposes.

IV. Sample code

1. Use of the percentiles parameter

set uppercentiles The parameter customizes the percentile to be calculated. Example:

custom_percentiles = [0.1, 0.5, 0.9]
custom_summary = (percentiles=custom_percentiles)
print(custom_summary)

2. Utilizationinclude cap (a poem)exclude parameters

utilizationinclude cap (a poem)exclude parameter to select the type of data to include or exclude.

integer_summary = (include='int')
print(integer_summary)

3. Processing of date-time columns

If the data frame contains a datetime column, use thedatetime_is_numeric parameter treats it as a numeric column for statistical purposes. Example:

import datetime

dates = [(2022, 1, 1), (2022, 1, 2), (2022, 1, 3)]
df['Date'] = dates

date_summary = (datetime_is_numeric=True)
print(date_summary)

to this article on the specific use of the Pandas Describe function is introduced to this article, more related Pandas describe () content, please search for my previous articles or continue to browse the following related articles I hope that you will support me in the future!