An introduction to the meaning of the parameters required for the autoderivative function backward in Pytorch

Normally backward( ) function is to pass parameters, have not been able to understand the parameters backward need to pass the specific meaning of the parameters, but it does not matter, life in with the toss, let's toss, hehehe.

Automatic derivatives for scalars

First of all, if out in () is a scalar (equivalent to a neural network with one sample, which has two attributes, and a neural network with one output) then at this point my backward function doesn't need to take any input parameters.

import torch
from  import Variable
 
a = Variable(([2,3]),requires_grad=True)
b = a + 3
c = b * 3
out = ()
()
print('input:')
print()
print('output:')
print(())
print('input gradients are:')
print()

Run results:

It is not hard to see that we constructed such a function:

So its derivation is also easy to see:

This is the result of a scalar autoderivative on it.

Automatic derivation on vectors

If out in () is a vector (or understood as a 1xN matrix), let's do an automatic derivation of the vector and see what happens.

Start by constructing such a model (equivalent to a neural network with one sample, which has two attributes, and a neural network with two outputs):

import torch
from  import Variable
 
a = Variable(([[2.,4.]]),requires_grad=True)
b = (1,2)
b[0,0] = a[0,0] ** 2 
b[0,1] = a[0,1] ** 3 
out = 2 * b
# Its parameters are to be passed the same matrix as the out dimension
(([[1.,1.]]))
print('input:')
print()
print('output:')
print()
print('input gradients are:')
print()

The model is also simple, and it is not hard to see what the out derived Jacobi should be:

Since a1 = 2 and a2 = 4, the matrix above should be.

The results of the run:

Well, it's true that it's 8 and 96, but if you think about it, it's not the same form as the Jacobi matrix we want. Could it be that backward automatically omits the zeros?

Let's go ahead and try it, this time with small modifications to the previous model, as follows:

import torch
from  import Variable
 
a = Variable(([[2.,4.]]),requires_grad=True)
b = (1,2)
b[0,0] = a[0,0] ** 2 + a[0,1] 
b[0,1] = a[0,1] ** 3 + a[0,0]
out = 2 * b
# Its parameters are to be passed the same matrix as the out dimension
(([[1.,1.]]))
print('input:')
print()
print('output:')
print()
print('input gradients are:')
print()

It can be seen that this model should have a Jacobi:

Run it:

Wait, what the hell? Normally it wouldn't be

What? Who am I? Where am I? Why am I given only 2 numbers, and 8 + 2 = 10 and 96 + 2 = 98. Is it all plus 2? Think about it. We just passed [ [ 1 , 1 ] ] as a parameter in the backward, so is it possible to install this relationship to the sum? Let's try it with a different parameter, and only change the incoming parameter to [ [ 1 , 2 ] ] in the program:

import torch
from  import Variable
 
a = Variable(([[2.,4.]]),requires_grad=True)
b = (1,2)
b[0,0] = a[0,0] ** 2 + a[0,1] 
b[0,1] = a[0,1] ** 3 + a[0,0]
out = 2 * b
# Its parameters are to be passed the same matrix as the out dimension
(([[1.,2.]]))
print('input:')
print()
print('output:')
print()
print('input gradients are:')
print()

Well, it's understandable this time, the parameter we passed in is a linear operation on the Jacobi matrix derived normally from the original model, and you can think of the parameter we passed in (set to arg) as a column vector, so the result we get is.

In this topic, we get the actual:

It looks like everything is perfectly explained, but just at the moment I just typed it, I realized that the official documentation says that the parameter passed in by () should have the same dimension as k, so it doesn't make sense if you go by the above. What went wrong?

On closer inspection, it turns out that when performing a linear operation on a Jacobi matrix, we should think of the parameter we pass in (set to arg) as a row vector (not a column vector), and then we get the result of.

That is:

This time we have it explained.

Now let's output the Jacobi matrix. In order not to cause ambiguity, let's make each value of the Jacobi matrix different (the reason the analysis was wrong in the first place was because there was the same data in the Jacobi matrix), so the model is slightly altered as follows:

import torch
from  import Variable
 
a = Variable(([[2.,4.]]),requires_grad=True)
b = (1,2)
b[0,0] = a[0,0] ** 2 + a[0,1] 
b[0,1] = a[0,1] ** 3 + a[0,0] * 2
out = 2 * b
# Its parameters are to be passed the same matrix as the out dimension
(([[1,0]]),retain_graph=True)
A_temp = ()
.zero_()
(([[0,1]]))
B_temp = 
print('jacobian matrix is:')
print(( (A_temp,B_temp),0 ))

If that's okay, our Jacobi matrix should be [ [ 8 , 2 ] , [ 4 , 96 ] ].

Okay, here's the moment to witness the miracle, don't blink, don't blink, don't blink... 3, 2, 1, bang, bang, bang!

Well, now summarize: because after a complex neural network, each value in out is a linear or non-linear combination of the attributes of many input samples (i.e., input data), then each value in out and each value of the input data are related, that is, each number in [out] can be derived from each number in [a], then we The meaning of the parameters [k1,k2,k3...kn] of backward() is:

It can also be interpreted as the weight of each out component in the derivation of an.

Automatic derivation of matrices

Now, what if out is a matrix?

The following example can also be understood as equivalent to a neural network with two samples, each with two attributes, and a neural network with two outputs.

import torch
from  import Variable
from torch import nn

a = Variable(([[2,3],[1,2]]),requires_grad=True)
w = Variable( (2,1),requires_grad=True )
out = (a,w)
(([[1.],[1.]]),retain_graph=True)
print("gradients are:{}".format())

If the previous example is understood, then this is also well understood, backward input parameter k is a 2x1 matrix, 2 represents the number of samples, which is based on the previous, and then weighted sum for each sample. The result is:

If interested, you can also expand on the multiple classification problem with multiple samples, guessing that the dimension of k should be [number of input samples * number of classifications]

Well, the pytorch autoderivative principle that has been bothering me for a long time has been completely understood.

The above this talk about the Pytorch in the automatic derivation function backward () the meaning of the required parameters is all I have shared with you, I hope to be able to give you a reference, and I hope that you will support me more.