The main purpose of our share today is to improve the efficiency of your code by using the command line and configuration files in Python
Let's go!
Let's practice with the process of tuning parameters in machine learning, and there are three ways to go about it. The first option is to use argparse, a popular Python module specialized in command-line parsing; another way is to read a JSON file where we can place all the hyperparameters; and the third and lesser-known way is to use a YAML file! Curious? Let's get started!
precondition
In the code below, I'll be using Visual Studio Code, a very efficient integrated Python development environment. The beauty of this tool is that it supports every programming language by installing extensions, integrates with the terminal, and allows for working with tons of Python scripts and Jupyter notebooks at the same time.
Of course if you don't know how to configure VSCode yet, you can look here
Hands-On Turning Visual Studio Code into a Python Development Godsend
dataset, using the Kaggle on the bike sharing dataset, which can be downloaded here or accessed at the end of the article
Using argparse
As shown above, we have a standard structure for organizing our small projects:
- The folder named data that contains our dataset
- file
- File for specifying hyperparameters
First, we can create a file , in which we have the basic procedure for importing the data, training the model on the training data and evaluating it on the test set:
import pandas as pd import numpy as np from import RandomForestRegressor from sklearn.model_selection import train_test_split from import StandardScaler from import mean_squared_error, mean_absolute_error from options import train_options df = pd.read_csv('data\') print(()) opt = train_options() X=(['instant','dteday','atemp','casual','registered','cnt'],axis=1).values y =df['cnt'].values X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2) if == True: scaler = StandardScaler() X = scaler.fit_transform(X) rf = RandomForestRegressor(n_estimators=opt.n_estimators,max_features=opt.max_features,max_depth=opt.max_depth) model = (X_train,y_train) y_pred = (X_test) rmse = (mean_squared_error(y_pred, y_test)) mae = mean_absolute_error(y_pred, y_test) print("rmse: ",rmse) print("mae: ",mae)
In the code, we also import the train_options function contained in the file. The latter file is a Python file from which we can change the hyperparameters considered in the
import argparse def train_options(): parser = () parser.add_argument("--normalize", default=True, type=bool, help='maximum depth') parser.add_argument("--n_estimators", default=100, type=int, help='number of estimators') parser.add_argument("--max_features", default=6, type=int, help='maximum of features',) parser.add_argument("--max_depth", default=5, type=int,help='maximum depth') opt = parser.parse_args() return opt
In this example, we use the argparse library, which is very popular for parsing command line arguments. First, we initialize the parser, and then, we can add the arguments we want to access.
This is an example of running the code:
python
There are two ways to change the default value of a hyperparameter. The first option is to set a different default value in the file. The other option is to pass the hyperparameter values from the command line:
python --n_estimators 200
We need to specify the name and corresponding value of the hyperparameter to be changed.
python --n_estimators 200 --max_depth 7
Using JSON Files
As before, we can maintain a similar file structure. In this case, we are replacing the file with a JSON file. In other words, we want to specify the values of hyperparameters in the JSON file and pass them to the file. JSON files can be a quick and intuitive alternative to the argparse library, which utilizes key-value pairs to store data. Below we create a file that contains the data we'll need to pass to other code later.
{ "normalize":true, "n_estimators":100, "max_features":6, "max_depth":5 }
As you can see above, it is very similar to a Python dictionary. But unlike a dictionary, it contains data in text/string format. There are also some common data types that have slightly different syntax. For example, Boolean values are false/true, while Python recognizes False/True. Other possible values in JSON are arrays, which are represented as Python lists in square brackets.
The beauty of using JSON data in Python is that it can be converted into a Python dictionary by the load method:
f = open("", "rb") parameters = (f)
To access a specific item, we just need to reference its key name in square brackets:
if parameters["normalize"] == True: scaler = StandardScaler() X = scaler.fit_transform(X) rf=RandomForestRegressor(n_estimators=parameters["n_estimators"],max_features=parameters["max_features"],max_depth=parameters["max_depth"],random_state=42) model = (X_train,y_train) y_pred = (X_test)
Using YAML Files
A final option is to utilize the potential of YAML. As with JSON files, we read the YAML file in Python code as a dictionary to access the values of hyperparameters.YAML is a human-readable data representation language in which hierarchies are represented using double-space characters rather than parentheses as in a JSON file. Below we show what the file will contain:
normalize: True n_estimators: 100 max_features: 6 max_depth: 5
In it, we open the file that will always be converted to a Python dictionary using the load method, this time imported from the yaml library:
import yaml f = open('','rb') parameters = (f, Loader=)
As before, we can access the values of the hyperparameters using the syntax required by the dictionary.
Final thoughts
Configuration files compile very quickly, whereas argparse requires a line of code for each parameter we want to add.
So we should choose the most suitable way according to our different situations
For example, if we need to annotate a parameter, JSON is inappropriate because it doesn't allow annotations, whereas YAML and argparse may be well suited.
to this article on the Python implementation of the three methods of parsing parameters detailed article is introduced to this, more related Python parsing parameter content, please search for my previous articles or continue to browse the following related articles I hope you will support me in the future more!