SoFunction
Updated on 2025-03-05

Detailed explanation of common data types in Pandas

Common data types for Pandas

Common data structures used by extension library pandas

as follows:

(1)Series: One-dimensional array with labels

(2)DatetimeIndes: Time series

(3)DateFrame: A two-dimensional table structure with labels and variable size

(4)Panel: 3D array with labels and variable size

1. One-dimensional arrays and common operations

Series consists of two parts: index and value, and is a dictionary-like structure.

The types of values ​​can be different, and if the index is not explicitly specified at creation time, a non-negative integer starting from 0 will be automatically used as the index.

import pandas as pd
import  as plt

# Set the output result column alignmentpd.set_option('.ambiguous_as_wide',True)
pd.set_option('.east_asian_width',True)

# Automatically create non-negative integer indexes starting from 0s1=(range(1,20,5))
# Create Series with dictionary, use the "key" of the dictionary as the indexs2=({'Chinese':90,'math':92,'Python':98,'physics':87,'Chemical':92})
# Modify the value corresponding to the specified indexs1[3]=-17
s2['Chinese']=94

print('s1 raw data'.ljust(20,'='))
print(s1,'\n')
print('Search for absolute values ​​for all data in s1'.ljust(20,'='))
print(abs(s1),'\n')
print('S1's index is preceded by the number 2'.ljust(20,'='))
print(s1.add_prefix(2),'\n')

print('s2 raw data'.ljust(20,'='))
print(s2,'\n')
print('histogram of s2 data'.ljust(20,'='))
()
()
print('The index of each row of s2 is followed by _Zhang San'.ljust(20,'='))
print(s2.add_suffix('_Zhang San'),'\n')
print('S2 maximum index'.ljust(20,'='))
print((),'\n')
print('Test whether the value of s2 is within the specified interval'.ljust(20,'='))
print((90,94,inclusive=True),'\n')
print('View data with more than 90 points in s2'.ljust(20,'='))
print(s2[s2>90],'\n')
print('View data greater than the median in s2'.ljust(20,'='))
print(s2[s2>()],'\n')
print('The operation between s2 and numbers'.ljust(20,'='))
print(round((s2**0.5)*10,1),'\n')
print('The smallest 2 values ​​in s2'.ljust(20,'\n'))
print((2),'\n')

# Four operations and exponent operations can be performed between two equal-length Series objects# Only calculate the values ​​corresponding to the indexes in both Series objects# The value corresponding to the non-common index is a null valueprint('Add two Series objects'.ljust(20,'='))
print((range(5))+(range(5,10)),'\n')

# pipe() method can implement the function of chain callsprint('Add each value by 3'.ljust(20,'='))
print((range(5)).pipe(lambda x:x+3).pipe(lambda x:x*3),'\n')
print('The remainder of squared versus 5 for each value'.ljust(20,'='))
print((range(5)).pipe(lambda x,y,z:(x**y)%z,2,5),'\n')

# apply() method is used to perform function operations on the value of the Series objectprint('Add 3 for each value.ljust(20,'='))
print((range(5)).apply(lambda x:x+3),'\n')

print('Standard deviation, unbiased variance, unbiased standard deviation'.ljust(20,'='))
print((range(5)).std(),'\n')
print((range(5)).var(),'\n')
print((range(5)).sem(),'\n')

print('Check if there is a value equivalent to True'.ljust(20,'='))
print(any(([3,0,True])),'\n')

print('See if all values ​​are equivalent to True'.ljust(20,'='))
print(all(([3,0,True])))

2. Time series and common operations

Use pandas' date_range() function to generate a time series object:

date_range(start=None,end=None,periods=None,freq='D',tz=None,normalize=False,name=None,closed=None,**kwargs)
  • (1) Start and end are used to specify the start and end date time
  • (2) Periods are used to specify the number of data to be generated
  • (3) freq is used to specify the time interval, default is 'D', indicating that there is one day difference between two adjacent dates.

Summarize

The above is personal experience. I hope you can give you a reference and I hope you can support me more.