SoFunction
Updated on 2024-10-30

How the backward method in pytorch automatically solves for the gradient

The pytorch backward() method automatically solves for the gradient

1. Distinguish between the source tensor and the resultant tensor

x = (-8.0, 8.0, 0.1, requires_grad= True)
y = ()

x is the source tensor and the tensor y obtained based on the source tensor x is the resultant tensor.

2, how to use backward () method automatically seek gradient

A scalar calling its backward() method automatically computes the gradient value of the source tensor according to the chain rule.

2.1, the resultant tensor is a one-dimensional tensor

Based on the above example, it is the one-dimensional tensor y that is turned into a scalar and then the gradient value of x is automatically computed by calling the backward() method.

So how do you turn a one-dimensional tensor y into a scalar?

This is generally achieved by summing over a one-dimensional tensor y, i.e., ().

A one-dimensional tensor is a vector, and summing a one-dimensional tensor is equivalent to multiplying this vector dot by an equal-dimensional unit vector; using the scalar() obtained by summing to derive the source tensor x yields the same result as deriving each element of y from each element of x. Ultimately, it has no effect on the solution of the gradient of the source tensor x.

Therefore, the code is as follows:

().backward() 

2.2. the resultant tensor is a two-dimensional tensor or higher

Leaving aside the above example, the resultant variable y may be a two-dimensional tensor or a higher-dimensional tensor, which can then be understood as a general dot product of an equal-dimensional unit tensor (dot product, a concept in vectors, is described in this way only for ease of understanding)

The code is as follows:

(torch.ones_like(y))#grad_tensors=torch.ones_like(y)

Gradient computation in pytorch

What is a gradient?

In a univariate function, the gradient at a point is labeled as the derivative at that point. In multivariate functions, the gradient at a point is the vector consisting of the partial derivatives of each independent variable.

In the previous linear regression As in the y = wx + b equation to find the optimal solution for the w-parameter, it is necessary to take partial derivatives of the w-parameter and then adjust the w-parameter by the value of the partial derivative in order to find the optimal solution.

Automatic calculation of gradient and partial derivatives

You can use the () method in PyTorch to automatically compute the gradient

When defining a tensor, you can specify requires_grad=True to indicate that the tensor can be evaluated for partial derivatives.

import torch
# Randomize out the tensor x Specify that partial derivatives can be computed
x = (1,requires_grad=True)
# y and z tensors are not derivable in terms of partial derivatives
y = (1)
z = (1)
# There are tensors in f1 that allow for partial derivatives #
f1 = 2*x + y
# There is no tensor in f2 to allow for partial derivatives
f2 = y + z
# Print the gradient of the two equations
print(f1.grad_fn)
print(f2.grad_fn)

Drawing conclusions:

  • The presence of a tensor in f1 that allows for partial derivatives is required to find the gradient
  • grad_fn is the gradient

1. Find the partial derivative of x

# Variables that can be used to find the gradient are first backpropagated using backward().
()
# Use the grad property of the tensor to get the value of the partial derivative

2. Calculation of the stopping gradient

Tensor.requires_grad_(False)

# Create a tensor Specify that you can take a partial derivative
a = (2,2,requires_grad=True)
# a corresponds to the b variable
b = ((a * 3)/(a - 1))
# View the gradient
print(b.grad_fn)
# Stopping the a-tensor can be a partial derivative
a.requires_grad_(False)
# Specify the b variable again
b = ((a * 3) / (a - 1))
# It's None now
print(b.grad_fn)

3. Get the same content of the tensor for which partial derivatives can be taken, but the new variables cannot be taken as partial derivatives.

Tensor.detach() method

a = (2,2,requires_grad=True)
# A tensor that can be derived returns an identical tensor but cannot be derived as a partial derivative
b = ()
print(a.requires_grad)
print(b.requires_grad)

4. The tensor cannot compute partial derivatives in the scope

with torch.no_grad(): the entire scope within the

a = (2, 2, requires_grad=True)
print((a ** 2).requires_grad)
with torch.no_grad():
    print((a ** 2).requires_grad)

Gradient clearing

In PyTorch, if we utilize the(methodSolves for the gradient of the tensor, and in the case of multiple runs of the function, the function accumulates the computed gradients.

So calculate the partial derivatives of the tensor in the function, and clear the calculation of the gradient after each calculation and modification of the parameters.

Does not clear the gradient calculation:

x = (4, requires_grad=True)
y = (2*x + 1).sum()
z = (2*x).sum()
()
print("First deflection:",)
()
print("Second deflection:",)

cumulative

Use the tensor.grad.zero_() method to clear the calculation of the gradient:

x = (4, requires_grad=True)
y = (2*x + 1).sum()
z = (2*x).sum()
()
.zero_()
print("First deflection:",)
()
print("Second deflection:",)

summarize

The above is a personal experience, I hope it can give you a reference, and I hope you can support me more.