SoFunction
Updated on 2025-03-02

Detailed explanation of how to use Pandas to delete non-numeric data in DataFrame

During the data processing and analysis process, you often encounter situations where data needs to be cleaned. One of the common tasks is to delete non-numeric data in DataFrame, as this data may interfere with numerical calculations and statistical analysis. Python's Pandas library provides a series of powerful ways to process data. This article will introduce in detail how to use Pandas to delete non-numeric data in DataFrame, including different ways to identify non-numeric data, delete non-numeric data, and practical application examples.

Identify non-numeric data

Before deleting non-numeric data, you first need to identify non-numeric data in the DataFrame. Pandas provides a variety of methods to identify non-numeric data, including the dtypes attribute, the select_dtypes() method, and the info() method.

import pandas as pd

# Create a DataFrame with mixed data typesdata = {'A': [1, '2', 3, '4', 5],
        'B': [1.1, 2.2, 3.3, 4.4, 5.5],
        'C': ['a', 'b', 'c', 'd', 'e']}
df = (data)

# Use the dtypes attribute to view the data type of each columnprint()

Output result:

A     object
B    float64
C     object
dtype: object

In this example, a DataFrame containing mixed data types is created and the data type for each column is viewed using the dtypes property. It can be seen that the data types of columns 'A' and 'C' are object, that is, non-numeric data.

In addition to the dtypes attribute, we can also use the select_dtypes() method to select columns of a specific data type, and combine the info() method to view the overall information of the DataFrame.

# Use the select_dtypes() method to select columns with non-numeric typesnon_numeric_columns = df.select_dtypes(exclude=['number']).columns
print("Non-numeric type column:", non_numeric_columns)

# Use the info() method to view the overall information of DataFrameprint(())

Output result:

Columns of non-numeric types: Index(['A', 'C'], dtype='object')
<class ''>
RangeIndex: 5 entries, 0 to 4
Data columns (total 3 columns):
 #   Column  Non-Null Count  Dtype  
---  ------  --------------  -----  
 0   A       5 non-null      object 
 1   B       5 non-null      float64
 2   C       5 non-null      object 
dtypes: float64(1), object(2)
memory usage: 248.0+ bytes
None

Delete non-numeric data

Once non-numeric data is identified, the data can be deleted using a variety of methods provided by Pandas, including the drop() method, the Boolean index, the applymap() method, and the to_numeric() function.

1. Use drop() method to delete columns of non-numeric types

# Use drop() method to delete columns of non-numeric typesdf_numeric = (columns=non_numeric_columns)
print("DataFrame after deleting non-numeric data:")
print(df_numeric)

2. Use boolean index to delete rows of non-numeric types

# Use boolean index to delete rows of non-numeric typesdf_numeric = df[(lambda x: isinstance(x, (int, float)))]
print("DataFrame after deleting non-numeric data:")
print(df_numeric)

3. Use the applymap() method to convert non-numeric data

# Use the applymap() method to convert non-numeric data to NaNdf_numeric = (lambda x: pd.to_numeric(x, errors='coerce'))
print("DataFrame after converting non-numeric data:")
print(df_numeric)

4. Use to_numeric() function to convert non-numeric data

# Use the to_numeric() function to convert non-numeric data to NaNdf_numeric = (pd.to_numeric, errors='coerce')
print("DataFrame after converting non-numeric data:")
print(df_numeric)

Application example: Processing sales data

Suppose there is a DataFrame of sales data that contains some non-numeric data. We need to clean the data and delete non-numerical types for subsequent analysis.

# Create a DataFrame containing sales datasales_data = {'Date': ['2022-01-01', '2022-01-02', '2022-01-03', '2022-01-04', '2022-01-05'],
              'Product': ['A', 'B', 'C', 'D', 'E'],
              'Revenue': ['$100', '$200', '$300', '$400', '$500']}
df_sales = (sales_data)

# Delete non-numeric data in the Revenue columndf_sales['Revenue'] = df_sales['Revenue'].replace('[\$,]', '', regex=True).astype(float)

print("Cleaned sales data:")
print(df_sales)

In this example, a DataFrame containing sales data is created, and the non-numeric type data in the Revenue column is washed away and converted to float type using a regular expression.

Summarize

In this article, we learned how to use Pandas to delete non-numeric data in DataFrame. First, we introduce methods to identify non-numeric data, including using the dtypes attribute, select_dtypes() method and info() method. Then, a variety of methods for deleting non-numeric data are introduced, including using the drop() method, the boolean index, the applymap() method, and the to_numeric() function. Finally, a practical application example is given, demonstrating how to deal with non-numerical type data in sales data.

This is the article about how to use Pandas to delete non-numeric data in DataFrame. For more related content related to Pandas to delete dataFrame, please search for my previous articles or continue browsing the related articles below. I hope everyone will support me in the future!