Python through the TensorFLow linear model training principle and implementation method details

This article example describes the principle and implementation of Python linear model training through TensorFLow. Shared for your reference, as follows:

1. Relevant concepts

For example to abstract from a linear distribution on the way to its y = kx + b distribution law

hallmarkis the input variable, i.e., the simple linear regression in thex Variables. Simple machine learning projects may use a single feature, while more complex machine learning projects may use millions of features.

tab (of a window) (computing)is the thing we are trying to predict, i.e., the simple linear regression of they Variables.

brochureis a specific instance of data.Sample labels availableIt is data with {features,labels} that are used to train models and summarize patterns.Unlabeled samplesOnly characterized data x is predicted by the model for its y-value.

mouldis a tool for mapping from features to labels, built through machine learning.

trainIt means that the model learns by having labeled samples to determine the desired values of its parameters. It is commonly understood that when some sample points (x,y) are given, their laws are summarized to determine the two parameters k and b in the model y=kx+b, and then this equation is used to compute the corresponding y-values in the case where only x is given.

damagesis a numerical value that indicates how accurately the model predicts for a single sample. The greater the difference between the predicted value and the accurate value, the greater the loss. The process of examining samples and minimizing model losses is called empirical risk minimization.L1 lossis the absolute value of the difference between the predicted and actual values of the label.square lossis the variance between the predicted and actual values of the sample.

Training of the modelIt is an iterative process in which the parameters of the model are first initially parameterized to get a preliminary model and the labeled values corresponding to the features are calculated, and then the loss is calculated after comparison. After that, the parameters of the model are adjusted, after which the prediction and calculation of the loss are performed, and so on until the total loss no longer changes or changes very slowly, at which point the model is said to have beenastringent。

Similarly for a quadratic function, the extremum of its function is found by constantly adjusting the value of x, at which the rate of change of the function is 0. So how to find the extremum can be done by using thegradient descentThat is, for a function curve, the fastest way to find an extreme point is to explore in the direction of the decreasing value of its gradient (negative gradient).

Forward propagation:Calculate the output value based on the input.Reverse propagation:Calculate the magnitude of the adjustment of the internal variables according to the optimizer algorithm, starting at the output level and working backward through each level until it reaches the input level.

In the gradient descent method, thebatchis the number of samples used to compute the gradient in a single iteration.stochastic gradient descentis to randomly select one sample at a time for gradient computation.

So how much forward movement in the direction of exploration is appropriate? This brings us tolearning rate, multiplying the gradient by the learning rate gives the location of the next point, also called the step size; if the step size is too small, then it may take many times to reach the target point, and if the step size is too large, then the target point may be crossed.

This type of parameter needs to be parameterized by humans prior to learning, rather than the parameter obtained through training, which is called thehyperparameterization.. Hyperparameters are knobs that programmers use to make adjustments to machine learning.

2. Algorithm design and training

The main steps in training via Tensor FLow are: preparing data, building the model, training the model, and making predictions

Prepare data

The data used can be processed from living data or manually generated datasets, such as generating random number of points near y=2x+1:

# Set the display of the image inline in jupyter, otherwise the image is not shown
%matplotlib inline         
import tensorflow as tf
import numpy as np             An open source numerical computation extension for #Python
import  as plt      A kind of plotting library for #Python.
 
(5)             # Set the type of pseudo-random number generated
x=(-1,1,100)          # Generate 100 isochronous series between -1 and 1 as the horizontal coordinates of the image
# Generate vertical coordinates based on y=2*x+1+noise
#randn(100) indicates that a sample value is returned from a standard normal distribution of 100 samples, and 0.4 is the data jitter magnitude
y=2*x+1.0+(100)*0.4
 
(x,y)              # Generate scatterplots
(x,2*x+1,color='red',linewidth=3)  #Generate a straight liney=2x+1

A scatterplot with curves for the artificial data was plotted in jupyter as follows:

build a model

# Define the function model, y=kx+b
def model(x,k,b):
  return (k,x)+b
# Define the parameter variables in the model and assign initial values to them
k=(1.0,name='k')
b=(0,name='b')
 
# Define placeholders for training data, x is the feature value, y is the label
x=(name='x')
y=(name='y')
# The predicted value yp corresponding to the eigenvalue x is derived from the model
yp=model(x,k,b)

The initial values of k and b do not affect the final results obtained, so feel free to specify a value.

training model

#Train the model, set training parameters (number of iterations, learning rate)
train_epoch=10
rate=0.05
# Define the mean square deviation as the loss function
loss=tf.reduce_mean((y-yp))
# Define the gradient descent optimizer and pass the parameters learning rate and loss function
optimizer=(rate).minimize(loss)
 
ss=()
init=tf.global_variables_initializer()
(init)
 
# Perform multiple rounds of iterative training, inputting sample values into the model one by one in each round, performing gradient descent optimization operations to derive parameters, and plotting model curves
for _ in range(train_epoch):
  for x1,y1 in zip(sx,sy):
    ([optimizer,loss],feed_dict={x:x1,y:y1})
  tmp_k=(session=ss)
  tmp_b=(session=ss)
  (sx,tmp_k*sx+tmp_b)
 
()

The number of iterations is an artificial specification of the number of times the model is to be trained. The learning rate cannot be too large or too small, and is generally set between 0.01 and 0.1 based on experience

Using the mean squared deviation as the loss function, square to find the square of y-yp, and then reduce_mean() to find the mean

And then the previous manually generated data input to the placeholder, through the zip() function first each sx, sy corresponds to the compression of a two-dimensional array, and then the 100 two-dimensional array traversal to take out and fill the placeholder x, y, respectively, so that the session runs the optimizer optimizer for iterative training.

You can see the results of the run as follows, with the predicted curve slowly fitting to the scatter of the distribution

carry out forecasting

According to the functional model, y=kx+b, the predicted value of label y can be obtained by bringing in the obtained parameters k, b and trait value x

3. Array operations

Numpy is a python library that supports a large number of dimensional arrays and matrix operations, through which you can easily convert data to arrays and manipulate them. the shape property of the np type outputs the dimensional composition of arrays. You can convert an array to a target shape by transposing the array, or (3,2). Examples are as follows

scalar=1
scalar_np=(scalar)       # Convert scalars to array types for np
print(scalar_np.shape)         # Only np has the shape attribute, and the output of the scalar's corresponding shape is ()
 
# Only ordered arrays of more than two dimensions can be viewed as matrices
matrix=[[1,2,3],[4,5,6]]
matrix_np=(matrix)       # Convert list to np matrix
print('Two-dimensional arrays:',matrix)       # Output as a single line array
print('Matrix form: \n',matrix_np)     # The results will be output as a multi-row matrix
print('Matrix Transpose: \n',matrix_np.T)
print('shape value',matrix_np.shape)

Matrices can be directly +, -, * operations, but only if the two matrices have the same shape. Matrices can also be cross-multiplied, provided that the rows of the former are the same as the columns of the latter, as shown in the example below, which runs as shown above right:

 
ma=([[1,2,3],[4,5,6]])
mb=([[1,2],[3,4],[5,6]])
print(ma+ma)
print(ma*ma)             # Matrix dot product
print((ma,mb))       #matrix cross product (math.)

4. Multiple linear regression models

A multiple linear regression model is based on a one-dimensional linear function y=kx+b, for different trait values x1, x2.... . xn, expanding the parameter k to more than one, i.e. y=k1x1+k2x2+... . knxn+b, and thus solving for n+1 parameters of the process. Where n k multiplied by x can be viewed as two matrices multiplied together. For example, the following is a simple model for house price prediction, there are x1~x12 a total of 12 trait values affecting the house price, corresponding to the label of house price, through the multivariate linear model to solve the corresponding parameters k1~k12, b, so as to predict the house price:

%matplotlib notebook
 
import tensorflow as tf
import  as plt
import numpy as np
import pandas as pd
from  import shuffle
 
#Read data csv file using pandas
data=pd.read_csv('D:/Temp/data/',header=0)
# Display data summary description information
#print(())
data=()                # Convert the value of data to an np array
for i in range(12):                    # Normalize all data
  data[:,i]=data[:,i]/(data[:,i].max()-data[:,i].min())
x_data=data[:,:12]                  # Intercept all rows, columns 0 through 11 as trait values x
y_data=data[:,12]                   # Intercept all rows, 12th column as labeled value y
 
x=(tf.float32,[None,12],name='x')  
#None means the number of rows is uncertain, 12 means a row of eigenvalues has 12 subdata
y=(tf.float32,[None,1],name='y')
 
with tf.name_scope('Model'):             #Define the namespace
  k=(tf.random_normal([12,1],stddev=0.01),name='k')
  b=(1.0,name='b')
  
  def model(x,k,b):
    return (k,x)+b            # Arrays k, x are cross-multiplied plus b
  
  yp=model(x,k,b)
 
# Define hyperparameters: number of trainings, learning rate, loss function
train_epochs=50
learning_rate=0.01
with tf.name_scope('Loss'):
  loss_function=tf.reduce_mean((y-yp))
# Define optimizers using gradient descent
optimizer=(learning_rate).minimize(loss_function)
 
ss=()
init=tf.global_variables_initializer()
(init)
loss_list=[]
 
for _ in range(train_epochs):
  loss_sum=0
  for(xs,ys)in zip(x_data,y_data):
    xs=(1,12)             # Adjust the dimension format of the data to match the placeholder x
    ys=(1,1)
    _,loss=([optimizer,loss_function],feed_dict={x:xs,y:ys})
    loss_sum+=loss
    
  shuffle(x_data,y_data)              # After each cycle, disrupt the order of the data
  k_tmp=(session=ss)
  b_tmp=(session=ss)
  print('k:',k_tmp,',b:',b_tmp)
  
  loss_avg=loss_sum/len(y_data)            # Find the value of losses per round
  loss_list.append(loss_avg)  
  
(loss_list)

Notes:

pandas is a python library that provides high-performance and easy to use data structure and data analysis tools , you can read data from csv, excel, txt, sql and other files , and the data structure is automatically converted to Numpy multidimensional arrays .

When using the gradient descent method for multiple linear regression model training, if the range of values taken by different eigenvalues varies too much (e.g., some trait values are taken to be 0.3~0.7, and some trait values are in the range of 300~700), it will affect the training results. Therefore, it is necessary to perform datanormalizeprocessing, i.e., using eigenvalues/(max-min), i.e., by deflating to unify the data all between 0 and 1.

Define namespace by tf.name_scope(), the defined variable name is only valid in the current space, preventing naming conflicts.

When initializing the variable k, a value is randomly selected by tf.random_normal() from between the orthotropic distribution [1,12] with variance stddev=0.01

Since x is a two-dimensional array of type [None,12] when the placeholder is defined, the data xs needs to be rearranged into a two-dimensional array containing 12 elements in one dimension and 1 subarray in the second dimension by (1,12) when filling the data, and similarly, y needs to be converted.

The implementation defines loss_list for saving the loss values, after each round of training to find the average of the loss values saved to the loss_list, and finally prints it as a graph, you can see the loss values from the beginning of the rapid decline, until the final change tends to level off.

The results of the run are as follows, intercepting some of the parameter values as well as the curve of the loss values:

Readers interested in more Python related content can check out this site's topic: thePython Data Structures and Algorithms Tutorial》、《Summary of Python encryption and decryption algorithms and techniques》、《Summary of Python coding manipulation techniques》、《Summary of Python function usage tips》、《Summary of Python string manipulation techniquesand thePython introductory and advanced classic tutorials》

I hope that what I have said in this article will help you in Python programming.