1 Overview
1.1 Linear regression
For the generalized linear regression problem, the parameters are solved by the least squares method with the following objective function:
1.2 Ridge regression
Ridge regression (ridge regression
) is a biased estimation regression method dedicated to the analysis of covariate data.
is a modified least squares estimation method that provides a stronger fit than least squares for some data.
1.3 Overfitting
Figure 2 is the normal fit, which is in line with the trend of the data, while Figure 3, although the fit is good on the training set, but when there is unknown data, such as when Size is large, based on the current fit, the results may be obtained very small, and the error with the actual error will be very large.
2 Ridge regression in sklearn
In the sklearn library, you can call the ridge regression model using sklearn.linear_model.Ridge, whose main parameters are:
- alpha: regularization factor, corresponding to the loss function in 𝜶
- fit_intercept: indicates whether the intercept is calculated or not.
- solver: set the method of calculating the parameters, optional parameters 'auto', 'svd', 'sag', etc.
3 Cases
Example of traffic flow prediction:
3.1 Introduction to data
The data is traffic flow monitoring data for an intersection that records traffic flow at an hourly level throughout the year.
3.2 Purpose of the experiment
Polynomial features are created based on the available data, and a ridge regression model is used instead of a general linear model to perform a polynomial regression on the traffic flow information.
3.3 The data are characterized as follows
HR
: Hour of the day (0-23)WEEK_DAY
: Day of the week (0-6)DAY_OF_YEAR
: Days of the year (1-365)WEEK_OF_YEAR
: Week of the year (1-53)TRAFFIC_COUNT
: Traffic flow
Full dataset contains more than 20,000 pieces of data (21626)
4 Python Implementation
4.1 Code
#*================1. create project, import sklearn related toolkit ====================** import numpy as np from sklearn.linear_model import Ridge # By loading the ridge regression method from sklearn import model_selection #Load the cross-validation module import as plt #Load matplotilib module from import PolynomialFeatures # Create polynomial features such as ab, a2, b2 by loading the #*=================2. Data loading=========================================** data=('ridge regression.csv',delimiter=',') #Use numpy's method to load data from a csv file print(data) print() (data[:,4]) # Use plt to display traffic flow information #() #*================3. data processing==========================================** X=data[:,:4] #X is used to hold 0-3 dimensional data, i.e. attributes y=data[:,4] ##y is used to hold the 4th dimension of data, i.e., traffic volume poly=PolynomialFeatures(6) # used to create polynomial features of up to the 6th power, and after several trials it was decided to use the 6th power X=poly.fit_transform(X) #X is the polynomial feature created #*================4. Divide the training set and test set=================================** train_set_x, test_set_x , train_set_y, test_set_y =model_selection.train_test_split(X,y,test_size=0.3, random_state=0) # divide all the data into training and test sets, test_size denotes the proportion of the test set. # #random_state is the random number seed #*==============5. Creating Regressors,and training===============================** clf=Ridge(alpha=1.0,fit_intercept = True) # Next we create the ridge regression example (train_set_x,train_set_y) #call the fit function to train the regressor using the training set (test_set_x,test_set_y) # Calculate the goodness-of-fit of the regression curve using the test set, returning a value of 0.7375 #Goodness of fit, used to evaluate how good the fit is, maximum is 1, no minimum, when outputting the same value for all inputs, the goodness of fit is 0. #*============6. Draw the fitted curve=========================================** start=100 #Next we draw a fitting curve in the range of 200 to 300 # end=200 y_pre=(X) # is the fitted value of the call to the predict function time=(start,end) (time,y[start:end],'b', label="real") (time,y_pre[start:end],'r', label='predict') # Showing real data (blue) as well as fitted curves (red) (loc='upper left') # Setting the location of the legend ()
4.2 Results
To this article on Artificial Intelligence - Python implementation of ridge regression is introduced to this article, more related Python implementation of ridge regression content please search my previous articles or continue to browse the following related articles I hope you will support me in the future!