SoFunction
Updated on 2025-03-03

How to use columns parameter to get the header name of each column of DataFrame

Use columns parameter to get the header name of each column of DataFrame

Sometimes, we want to get the header names (i.e. column names) of an existing data table and see what data is stored in this table. We can print the columns list to implement this function.

We can use the following code to view:

import pandas as pd
dict_data = {
	'student':["Li Lei","Han Meimei","Tom"],
	'score'	:[95,98,92],
	'gender':['M','F','M']
}
df_data1 = (dict_data)
df_data2 = (dict_data,columns=['gender','student','score'])

print(df_data1)
print(df_data2)
print(df_data1.columns)
print(df_data2.columns)

The results after running are as follows:

      student  score gender
0      Li Lei     95      M
1  Han Meimei     98      F
2         Tom     92      M
  gender     student  score
0      M      Li Lei     95
1      F  Han Meimei     98
2      M         Tom     92
Index(['student', 'score', 'gender'], dtype='object')
Index(['gender', 'student', 'score'], dtype='object')
[Finished in 4.2s]

Courseware, columns list is essentially an Index type object.

Some tips for pandas dataframe

1. Sort by date

df211['rq']=pd.to_datetime()
df211=df211.sort_values(['rq']).reset_index(drop=True)

1) df = df.sort_values(by='date') should work too

2) After the above operation, 'rq' will become the timestamp type and be converted to the datetime type:

ts=(list(df211['rq'])[0]).date()

ts=(list(df211['rq'])[0]).to_pydatetime()

Also, get the date range for a column and sort it:

def change_date(s):
    s = (s, "%Y-%m-%d")  # Standardize the date, and the conversion results are as follows: 2015/1/4 => 2015-01-04 00:00:00    s = str(s)  # Previously convert date into time format, so you need to convert date back to str format    return s[:10] # Get only the string before "position 10"data = list(df_0328['rq'].unique())
data=list(map(change_date,data) )
print(type(data))
(key=lambda date: (date, "%Y-%m-%d"))

2. Remove specific rows and columns

df211 = (df21[df21['road_name']!='Hui Xin Home'].index).reset_index(drop=True)

3. Statistics the number of occurrences of various values ​​in the column

df2['road_name'].value_counts()

4. Process multiple tables embedded in a table & process multi-level table headers

1)

xl = ('Road Zone Model.xlsx',engine='openpyxl')
sheet_names = xl.sheet_names              # All sheet namesprint(sheet_names)

2) There are many ways, but no best one is found

If there is a secondary header, then:

df0512 = pd.read_excel('Road Zone Model.xlsx',engine='openpyxl',\
                       sheet_name='Table 2',header=[0,1])

5. Take out the numerical values ​​in a certain column/remove non-numerical terms

Using pd.to_numeric

b_=[x for x in (list(df21[('SF', 'B-side order collection')])) if not (pd.to_numeric(x, errors='coerce'))]

6. Go to the first ten characters of a certain column of character type

df_deliver['date'] = df_deliver['create_time'].str[:10]

7. Remove mid-day hours

df_deliver['hour'] = pd.to_datetime(df_deliver['time']).apply(lambda x:)

8. Coordinate conversion

def GCJ2WGS(lat,lon):
# location format is as follows: locations[1] = "113.923745,22.530824"    a = 6378245.0 # Krasovsky ellipsoid parameter semi-axis a    ee = 0.00669342162296594323 #Krasovsky ellipsoid parameter first eccentricity square    PI = 3.14159265358979324 # Pi    # The following is the conversion formula    x = lon - 105.0
    y = lat - 35.0

    dLon = 300.0 + x + 2.0 * y + 0.1 * x * x + 0.1 * x * y + 0.1 * (abs(x));
    dLon += (20.0 * (6.0 * x * PI) + 20.0 * (2.0 * x * PI)) * 2.0 / 3.0;
    dLon += (20.0 * (x * PI) + 40.0 * (x / 3.0 * PI)) * 2.0 / 3.0;
    dLon += (150.0 * (x / 12.0 * PI) + 300.0 * (x / 30.0 * PI)) * 2.0 / 3.0;
    #latitude    dLat = -100.0 + 2.0 * x + 3.0 * y + 0.2 * y * y + 0.1 * x * y + 0.2 * (abs(x));
    dLat += (20.0 * (6.0 * x * PI) + 20.0 * (2.0 * x * PI)) * 2.0 / 3.0;
    dLat += (20.0 * (y * PI) + 40.0 * (y / 3.0 * PI)) * 2.0 / 3.0;
    dLat += (160.0 * (y / 12.0 * PI) + 320 * (y * PI / 30.0)) * 2.0 / 3.0;
    radLat = lat / 180.0 * PI
    magic = (radLat)
    magic = 1 - ee * magic * magic
    sqrtMagic = (magic)
    dLat = (dLat * 180.0) / ((a * (1 - ee)) / (magic * sqrtMagic) * PI);
    dLon = (dLon * 180.0) / (a / sqrtMagic * (radLat) * PI);
    wgsLon = lon - dLon
    wgsLat = lat - dLat
    return wgsLat,wgsLon

lat = list(df_1['lat'])
lon=list(df_1['lon'])
data=list(map(GCJ2WGS,lat,lon) )

Summarize

The above is personal experience. I hope you can give you a reference and I hope you can support me more.