background
After repeated tuning of known data sets, a more accurate model is trained to predict or classify new data with the same format.
Do you have to repeatedly run the source data and code used to train the model?
It is a common practice to encapsulate the trained model into a model file and directly call this model file for subsequent training.
1. Save the best model
(value,filename,compress=0,protocol=None)
- value: Any Python object to be stored on disk.
- filename: file name, or file object. The file object or file path to store the file in it. With one of the supported file extensions (".z", ".gz", "bz2", ".xz", ".lzma")
- compress: int from 0 to 9 or bool or 2 tuple. Optional compression level for data. 0 or False is not compressed, a higher value indicates more compression, but also reduces read and write time. Using 3 values is usually a good tradeoff. If compress is
- True, the compression level used is 3. If the compress is a 2 tuple, the first element must correspond to a string between supported compressors (e.g. 'zlib', 'gzip', 'bz2', 'lzma', 'xz'), and the second element must be an integer from 0 to 9, corresponding to the compression level.
- protocol: Don't worry, it's the same as the protocol parameter in pickle
Give an example
- Import data
import pandas as pd # Training setfile_pos="F:\\python_machine_learing_work\\501_model\\data\\train set\\train_data_only_one.csv" data_pos=pd.read_csv(file_pos,encoding='utf-8') # Test setval_pos="F:\\python_machine_learing_work\\501_model\\data\\test set\\test_data_table_only_one.csv" data_val=pd.read_csv(val_pos,encoding='utf-8')
- Divide data
# Important variablesipt_col=['called_rate', 'calling_called_act_hour', 'calling_called_distinct_rp', 'calling_called_distinct_cnt', 'star_level_int', 'online_days', 'calling_called_raom_cnt', 'cert_cnt', 'white_flag_0', 'age', 'calling_called_cdr_less_15_cnt', 'white_flag_1', 'calling_called_same_area_rate', 'volte_cnt', 'cdr_duration_sum', 'calling_hour_cnt', 'cdr_duration_avg', 'calling_pre7_rate', 'cdr_duration_std', 'calling_disperate', 'calling_out_area_rate', 'calling_distinct_out_op_area_cnt','payment_type_2.0', 'package_price_group_2.0', 'is_vice_card_1.0'] #Split the dataset (one training set and one test set)def train_test_spl(train_data,val_data): global ipt_col X_train=train_data[ipt_col] X_test=val_data[ipt_col] y_train=train_data[target_col] y_test=val_data[target_col] return X_train, X_test, y_train, y_test X_train, X_test, y_train, y_test =train_test_spl(data_pos_4,data_val_4)
- Training the model
from sklearn.model_selection import GridSearchCV def model_train(X_train,y_train,model): ## Import XGBoost model from import XGBClassifier if model=='XGB': parameters = {'max_depth': [3,5, 10, 15, 20, 25], 'learning_rate':[0.1, 0.3, 0.6], 'subsample': [0.6, 0.7, 0.8, 0.85, 0.95], 'colsample_bytree': [0.5, 0.6, 0.7, 0.8, 0.9]} xlf= XGBClassifier(n_estimators=50) grid = GridSearchCV(xlf, param_grid=parameters, scoring='accuracy', cv=3) (X_train, y_train) best_params=grid.best_params_ res_model=XGBClassifier(max_depth=best_params['max_depth'],learning_rate=best_params['learning_rate'],subsample=best_params['subsample'],colsample_bytree=best_params['colsample_bytree']) res_model.fit(X_train, y_train) else: pass return res_model xgb_model= model_train(X_train, y_train, model='XGB')
- Save the model
# Import packageimport joblib # Save the model(xgb_model, 'train_rf_importance_model.dat', compress=3)
2. Load the model and use it for prediction
load (filename, mmap_mode=None)
- filename: or file object. The file or file path to the object to be loaded from.
- mmap_mode: {None, 'r +', 'r', 'w +', 'c'}, optional Memory maps the array from disk if not "None". This mode is invalid for compressed files. Note that in this case, the rebuild object may no longer match the original object exactly.
Loading the model
# Loading the modelload_model_xgb_importance = ("F:\\python_machine_learing_work\\501_model\\data\\test set\\train_xgb_importance_model.dat") # Use the model to predicty_pred_rf = model_predict(load_model_xgb_importance, X_test, alpha = alpha)
This is the end of this article about the detailed explanation of the joblib module in Python. For more related Python joblib module content, please search for my previous articles or continue browsing the related articles below. I hope everyone will support me in the future!