SoFunction
Updated on 2024-10-30

Pandas data grouping statistics implementation examples

1. Grouping statistics groupby () function

Grouping statistics on the data, the main application of the DataFrame object groupby () function. Its function is as follows.

(1) Split data into groups based on specific conditions

(2) Each group can apply functions independently (e.g. summation function sum(), mean function mean(), etc.)

(3) Combine results into one data structure

Example 1.

Order data is grouped and summed according to the "first level of classification".

import pandas as pd  # Import pandas module
df=pd.read_csv('',encoding='gbk')
# Extract data
df1=df[['Primary classification','7-day hits','Order Booking']]
df1=('Primary classification').sum()       # Grouping statistics for summation

Example 2.

Grouping and summing of order data according to "primary classification" and "secondary classification" of books

import pandas as pd  # Import pandas module
df=pd.read_csv('',encoding='gbk')
# Extract data
df1=df[['Primary classification','Secondary Classification','7-day hits','Order Booking']]
df2=(['Primary classification','Secondary Classification']).sum()    # Grouping statistics for summation

Example 3.

Find the number of hits for seven days for each secondary category. First, categorize the hits by "secondary category", and then sum up the group statistics.

df1 = ('Secondary Classification')['Seven-day hits'].sum()

2. Iterate over grouped data

Example 1.

Grouping by "first level of categorization" and exporting the order data in each categorization

# Extraction of data
df1 = df[['Primary classification',‘Seven days of hits','Order Booking']]
for name, group in ('Primary classification')
    print(name)
    print(group)

Where name is 'first level of classification' and group is other data. So using groupby() function to group multiple columns, then you need to specify multiple columns in the for loop.

3. grouped in one or more columns using the aggregation function

Python can also be realized like SQL in the grouping and aggregation operations, mainly through the groupby () function and agg () function to achieve .

The following code implements it:

1. Grouping by 'first level of disaggregation' and finding the mean and sum of the grouped values

2. to 'first class classification' grouping, after the grouping of 'seven days clicks' the average and the sum of the 'order booking' sum

('Primary classification').agg(['mean','sum'])

('Primary classification').agg({'Seven-day hits':['mean','sum'],'Order Booking':['sum']})

We can implement array grouping statistics with custom functions. Book p110

The following code implements it:

1. Count the products with the highest number of purchases, the number of purchases per capita, the cost per capita, the total number of purchases, and the total cost in the January sales data.

df = pd.read_excel('January.xlsx')
max1 = lambda x: x.value_counts(dropna=false).index[0]
df1 = ({'Baby Title':[max1],
              'Number':['sum','mean'],
              'Actual amount paid by the seller':['sum','mean']})
print(df1)

4. Dictionary and Series objects through the group statistics

1. Grouping statistics by dictionary

Create a dictionary, and the () function groups through the information within the dictionary.

import pandas as pd  # Import pandas module
# Solve the problem of unaligned column names when exporting data
pd.set_option('.east_asian_width', True)
df=pd.read_csv('',encoding='gbk')  #Import csv file
df=df.set_index(['Trade name'])
#Creating a dictionary
mapping={'Beijing outbound sales':'Up North','Shanghai Outbound Sales':'Up North',
         'Guangzhou outbound sales':'Up North','Chengdu outbound sales':'Chengdu',
         'Wuhan outbound sales':'Wuhan','Xi'an outbound sales':'Xi'an'}
df1=(mapping,axis=1).sum()
print(df1)

2. Through the Series object for grouping statistics

Create a Series object, then pass the Series object to the groupby() function to realize the data grouping. put index + value inside the Series object: e.g. 'Beijing out of stock sales', corresponding value 'North, Shanghai and Guangzhou'.

import pandas as pd  # Import pandas module
# Solve the problem of unaligned column names when exporting data
pd.set_option('.east_asian_width', True)
df=pd.read_csv('',encoding='gbk')  #Import csv file
df=df.set_index(['Trade name'])
data={'Beijing outbound sales':'Up North','Shanghai Outbound Sales':'Up North',
         'Guangzhou outbound sales':'Up North','Chengdu outbound sales':'Chengdu',
         'Wuhan outbound sales':'Wuhan','Xi'an outbound sales':'Xi'an',}
s1=(data)
print(s1)
df1=(s1,axis=1).sum()
print(df1)

to this article on the Pandas data grouping statistics to achieve the example of the article is introduced to this, more related Pandas grouping statistics content, please search for my previous posts or continue to browse the following related articles I hope that you will support me in the future !