Detailed explanation of the joblib module in Python

background

After repeated tuning of known data sets, a more accurate model is trained to predict or classify new data with the same format.

Do you have to repeatedly run the source data and code used to train the model?

It is a common practice to encapsulate the trained model into a model file and directly call this model file for subsequent training.

1. Save the best model

(value,filename,compress=0,protocol=None)

value: Any Python object to be stored on disk.
filename: file name, or file object. The file object or file path to store the file in it. With one of the supported file extensions (".z", ".gz", "bz2", ".xz", ".lzma")
compress: int from 0 to 9 or bool or 2 tuple. Optional compression level for data. 0 or False is not compressed, a higher value indicates more compression, but also reduces read and write time. Using 3 values is usually a good tradeoff. If compress is
True, the compression level used is 3. If the compress is a 2 tuple, the first element must correspond to a string between supported compressors (e.g. 'zlib', 'gzip', 'bz2', 'lzma', 'xz'), and the second element must be an integer from 0 to 9, corresponding to the compression level.
protocol: Don't worry, it's the same as the protocol parameter in pickle

Give an example

Import data

import pandas as pd
# Training setfile_pos="F:\\python_machine_learing_work\\501_model\\data\\train set\\train_data_only_one.csv"
data_pos=pd.read_csv(file_pos,encoding='utf-8')
# Test setval_pos="F:\\python_machine_learing_work\\501_model\\data\\test set\\test_data_table_only_one.csv"
data_val=pd.read_csv(val_pos,encoding='utf-8')

Divide data

# Important variablesipt_col=['called_rate', 'calling_called_act_hour', 'calling_called_distinct_rp', 'calling_called_distinct_cnt', 'star_level_int', 'online_days', 'calling_called_raom_cnt', 'cert_cnt', 'white_flag_0', 'age', 'calling_called_cdr_less_15_cnt', 'white_flag_1', 'calling_called_same_area_rate', 'volte_cnt', 'cdr_duration_sum', 'calling_hour_cnt', 'cdr_duration_avg', 'calling_pre7_rate', 'cdr_duration_std', 'calling_disperate', 'calling_out_area_rate', 'calling_distinct_out_op_area_cnt','payment_type_2.0', 'package_price_group_2.0', 'is_vice_card_1.0']
#Split the dataset (one training set and one test set)def train_test_spl(train_data,val_data):
    global ipt_col
    X_train=train_data[ipt_col]
    X_test=val_data[ipt_col]
    y_train=train_data[target_col]
    y_test=val_data[target_col]
    return X_train, X_test, y_train, y_test
	X_train, X_test, y_train, y_test =train_test_spl(data_pos_4,data_val_4)

Training the model

from sklearn.model_selection import GridSearchCV
def model_train(X_train,y_train,model):
    ## Import XGBoost model    from  import XGBClassifier
    if  model=='XGB':
        parameters = {'max_depth': [3,5, 10, 15, 20, 25],
          			  'learning_rate':[0.1, 0.3, 0.6],
          			  'subsample': [0.6, 0.7, 0.8, 0.85, 0.95],
              		  'colsample_bytree': [0.5, 0.6, 0.7, 0.8, 0.9]}
        xlf= XGBClassifier(n_estimators=50)
        grid = GridSearchCV(xlf, param_grid=parameters, scoring='accuracy', cv=3)
        (X_train, y_train)
        best_params=grid.best_params_
        res_model=XGBClassifier(max_depth=best_params['max_depth'],learning_rate=best_params['learning_rate'],subsample=best_params['subsample'],colsample_bytree=best_params['colsample_bytree'])
        res_model.fit(X_train, y_train)
    else:
        pass
    return res_model
xgb_model= model_train(X_train, y_train, model='XGB')

Save the model

# Import packageimport joblib 
# Save the model(xgb_model, 'train_rf_importance_model.dat', compress=3)

2. Load the model and use it for prediction

load (filename, mmap_mode=None)

filename: or file object. The file or file path to the object to be loaded from.
mmap_mode: {None, 'r +', 'r', 'w +', 'c'}, optional Memory maps the array from disk if not "None". This mode is invalid for compressed files. Note that in this case, the rebuild object may no longer match the original object exactly.

Loading the model

# Loading the modelload_model_xgb_importance = ("F:\\python_machine_learing_work\\501_model\\data\\test set\\train_xgb_importance_model.dat")
# Use the model to predicty_pred_rf = model_predict(load_model_xgb_importance, X_test, alpha = alpha)

This is the end of this article about the detailed explanation of the joblib module in Python. For more related Python joblib module content, please search for my previous articles or continue browsing the related articles below. I hope everyone will support me in the future!