Python implementation of the three methods of parsing parameters in detail

The main purpose of our share today is to improve the efficiency of your code by using the command line and configuration files in Python

Let's go!

Let's practice with the process of tuning parameters in machine learning, and there are three ways to go about it. The first option is to use argparse, a popular Python module specialized in command-line parsing; another way is to read a JSON file where we can place all the hyperparameters; and the third and lesser-known way is to use a YAML file! Curious? Let's get started!

precondition

In the code below, I'll be using Visual Studio Code, a very efficient integrated Python development environment. The beauty of this tool is that it supports every programming language by installing extensions, integrates with the terminal, and allows for working with tons of Python scripts and Jupyter notebooks at the same time.

Of course if you don't know how to configure VSCode yet, you can look here

Hands-On Turning Visual Studio Code into a Python Development Godsend

dataset, using the Kaggle on the bike sharing dataset, which can be downloaded here or accessed at the end of the article

Using argparse

As shown above, we have a standard structure for organizing our small projects:

The folder named data that contains our dataset
file
File for specifying hyperparameters

First, we can create a file , in which we have the basic procedure for importing the data, training the model on the training data and evaluating it on the test set:

import pandas as pd
import numpy as np
from  import RandomForestRegressor
from sklearn.model_selection import train_test_split
from  import StandardScaler
from  import mean_squared_error, mean_absolute_error

from options import train_options

df = pd.read_csv('data\')
print(())
opt = train_options()

X=(['instant','dteday','atemp','casual','registered','cnt'],axis=1).values
y =df['cnt'].values
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2)

if  == True:
    scaler = StandardScaler()
    X = scaler.fit_transform(X)
    
rf = RandomForestRegressor(n_estimators=opt.n_estimators,max_features=opt.max_features,max_depth=opt.max_depth)
model = (X_train,y_train)
y_pred = (X_test)
rmse = (mean_squared_error(y_pred, y_test))
mae = mean_absolute_error(y_pred, y_test)
print("rmse: ",rmse)
print("mae: ",mae)

In the code, we also import the train_options function contained in the file. The latter file is a Python file from which we can change the hyperparameters considered in the

import argparse

def train_options():
    parser = ()
    parser.add_argument("--normalize", default=True, type=bool, help='maximum depth')
    parser.add_argument("--n_estimators", default=100, type=int, help='number of estimators')
    parser.add_argument("--max_features", default=6, type=int, help='maximum of features',)
    parser.add_argument("--max_depth", default=5, type=int,help='maximum depth')
    opt = parser.parse_args()
    return opt

In this example, we use the argparse library, which is very popular for parsing command line arguments. First, we initialize the parser, and then, we can add the arguments we want to access.

This is an example of running the code:

python

There are two ways to change the default value of a hyperparameter. The first option is to set a different default value in the file. The other option is to pass the hyperparameter values from the command line:

python  --n_estimators 200

We need to specify the name and corresponding value of the hyperparameter to be changed.

python  --n_estimators 200 --max_depth 7

Using JSON Files

As before, we can maintain a similar file structure. In this case, we are replacing the file with a JSON file. In other words, we want to specify the values of hyperparameters in the JSON file and pass them to the file. JSON files can be a quick and intuitive alternative to the argparse library, which utilizes key-value pairs to store data. Below we create a file that contains the data we'll need to pass to other code later.

{
"normalize":true,
"n_estimators":100,
"max_features":6,
"max_depth":5 
}

As you can see above, it is very similar to a Python dictionary. But unlike a dictionary, it contains data in text/string format. There are also some common data types that have slightly different syntax. For example, Boolean values are false/true, while Python recognizes False/True. Other possible values in JSON are arrays, which are represented as Python lists in square brackets.

The beauty of using JSON data in Python is that it can be converted into a Python dictionary by the load method:

f = open("", "rb")
parameters = (f)

To access a specific item, we just need to reference its key name in square brackets:

if parameters["normalize"] == True:
    scaler = StandardScaler()
    X = scaler.fit_transform(X)
rf=RandomForestRegressor(n_estimators=parameters["n_estimators"],max_features=parameters["max_features"],max_depth=parameters["max_depth"],random_state=42)
model = (X_train,y_train)
y_pred = (X_test)

Using YAML Files

A final option is to utilize the potential of YAML. As with JSON files, we read the YAML file in Python code as a dictionary to access the values of hyperparameters.YAML is a human-readable data representation language in which hierarchies are represented using double-space characters rather than parentheses as in a JSON file. Below we show what the file will contain:

normalize: True 
n_estimators: 100
max_features: 6
max_depth: 5

In it, we open the file that will always be converted to a Python dictionary using the load method, this time imported from the yaml library:

import yaml
f = open('','rb')
parameters = (f, Loader=)

As before, we can access the values of the hyperparameters using the syntax required by the dictionary.

Final thoughts

Configuration files compile very quickly, whereas argparse requires a line of code for each parameter we want to add.

So we should choose the most suitable way according to our different situations

For example, if we need to annotate a parameter, JSON is inappropriate because it doesn't allow annotations, whereas YAML and argparse may be well suited.

to this article on the Python implementation of the three methods of parsing parameters detailed article is introduced to this, more related Python parsing parameter content, please search for my previous articles or continue to browse the following related articles I hope you will support me in the future more!