1. Limitations of logistic regression
Logistic regression classification is a linear function input into a sigmoid function for conversion, and afterward classification, a straight line for classification will be drawn on the graph, but in cases like the one below, no matter how it is drawn, a straight line cannot separate them completely.
However, if we can perform a transformation on the input features, it is possible to classify them perfectly. For example:
Create a new feature x1: distance to (0,0) and another x2: distance to (1,1). This allows the new features corresponding to the four points to be calculated and plotted onto the coordinate system as shown in the right-hand figure below. Once transformed in this way, the transformed data can be fed into a logistic regression that separates them completely.
Although it is not easy to find such a conversion criterion directly, we can find the criterion through logistic regression, using the first logistic regression to find the first converted parameter x1, and then using the second logistic regression to find the second converted parameter x2, using these two as new inputs to the third logistic regression, which can complete the classification.
Therefore, we can adjust the parameters, so that the input x1, x2 belongs to the two categories of probability (in fact, it is a number in the middle of the 0-1, we tentatively referred to as the probability) is shown in the figure below. Then the upper-left corner of the point belongs to the two categories of probability is (0.73,0.05), similarly, the other points also have the probability of belonging to the two categories of probability, and put it on the coordinate axis, you have completed the conversion of the features. The result after the conversion is given as input to a new logistic regression and the classification is completed.
2. Introduction of deep learning
It can be seen that each logistic regression unit can either act as a receiver, receiving input data, or as a sender, using its output as input data for other logistic regression units.
Multiple logistic regression units intertwined together is called a neural network, and each logistic regression unit, is a neuron. This type of learning is called deep learning.
Here is an example:
Assuming that the initial input data are 1 and -1 and all the weights are known to us, for example, the weights of the two data to the two neurons in the first layer are 1, -1, -2, 1, and then after transforming them by the sigmoid function then we can calculate the results to be 0.98,0.12, and similarly, if we know all the weights (parameters) that follow, we end up with Two outputs, 0.62,0.83
When the very first data inputs are 0 and 0, the same conversion gives the outputs 0.51,0.85. As you can see, no matter what the inputs are, we can always output them as data with completely different characteristics by taking a series of transformations with a series of parameters.
Thus, the entire network can be viewed as a function. More generally, as shown in the figure below, each circle is a neuron, the top input is called the input layer, the last one, which is not connected to any neuron, is called the output layer, and all the ones in the middle are called the hidden layer. A neuron like the one below where each neuron is connected to all the neurons in the next layer is called a fully connected neural network.
3. deep learning computation
For deep learning, calculations are usually performed using matrix operations.
More generally:
That is, the parameters of the previous layer * the input values given by the previous layer + the bias term, and a sigmoid function transformation of the whole outputs a data of this layer for the next layer. The same operation is done for all neurons up to the output layer.
4. Loss function of neural network
For a sample, the loss function is shown below:
For example, the input is sample "1", there are 256 pixels, that is, 256 features, will be input into the neural network, the final output is a 10-dimensional vector, each dimension, there will be a probability of value, for example, the probability of "1" for 0.8, "2" for 0.1, and so on. 0.8, "2" probability of 0.1, etc., and the actual label is "1", that is, only y1hat is 1, the other is 0. These two vectors to find the cross-entropy and sum, as shown in the above figure of the equation, the resulting C is the loss of this sample.
For the whole, it is sufficient to count and sum all the sample losses.
This is the detailed content of python artificial intelligence deep learning introductory logistic regression limitations, more information about python artificial intelligence logistic regression limitations please pay attention to my other related articles!