How to build a model using Pytorch

1 Model definition

Much like TF, Pytorch also builds models by inheriting from the parent class, again implementing two methods. In TF it is __init__() and call(), in Pytorch it is __init__() and forward(). The functions are similar, both initialize the internal structure of the model and perform inference respectively. Other functions, such as calculating the loss and training functions, can be inherited, but of course this is optional. The following is a demo to determine MNIST handwriting, first give the model code:

import numpy as np
import  as plt 
import torch 
from torch import nn,optim 
from torchsummary import summary 
from  import mnist
from  import to_categorical
device = ('cuda') #——————1——————
 
class ModelTest():
 def __init__(self,device):
  super().__init__() 
  self.layer1 = ((),(28*28,512),())#——————2——————
  self.layer2 = ((512,512),()) 
  self.layer3 = ((512,512),())
  self.layer4 = ((512,10),()) 

  (device) #——————3——————
   = ((),lr=0.01)#——————4——————
 def forward(self,inputs): #——————5——————
  x = self.layer1(inputs)
  x = self.layer2(x)
  x = self.layer3(x)
  x = self.layer4(x)
  return x 
 def get_loss(self,true_labels,predicts): 
  loss = -true_labels * (predicts) #——————6——————
  loss = (loss)
  return loss
 def train(self,imgs,labels): 
  predicts = model(imgs) 
  loss = self.get_loss(labels,predicts)
  .zero_grad()#——————7——————
  ()#——————8——————
  ()#——————9——————
model = ModelTest(device)
summary(model,(1,28,28),3,device='cuda') #——————10——————

#1: Get the device to facilitate memory migration of models and variables later, there are only two types of device names: 'cuda' and 'cpu'. This is usually needed if you have a GPU, so that you can migrate variables from main memory to video memory if needed. If you don't have a GPU, it's fine if you don't get it, pytorch will save all the parameters in main memory by default.

#2: Definition of layers in the model, Sequential can be used to centrally represent the layers that you want to manage uniformly as one layer.

#3: Migrate the model parameters to GPU memory in the initialization to accelerate the computation, of course you can also migrate them externally in the execution (device) when needed.

#4: Define the model's optimizer, unlike TF, pytorch needs to pass in the parameters that require gradient descent at the time of definition, which is where (), indicating all the parameters of the current model. You don't actually have to worry about the order in which you define the optimizer and the model parameters, because the output of the () is not an instance of the model parameters, but a pointer to the entire model parameter object, so even if you define the optimizer after defining a layer, it will still be able to optimize to. Of course the optimizer you can also define externally, just pass in (). Here a stochastic gradient descent is defined.

#5: Forward propagation of the model, similar to TF's call(), it is this function that is executed by defining the model().

#6: I integrated the function to obtain the loss into the model, where the cross entropy between the true label and the predicted label is calculated.

#7/8/9: In TF, the parameter gradients are stored in the gradient bands, while in pytorch, the parameter gradients are each integrated into the corresponding parameter, which can be used to view them. Each time backward() is performed on the loss, pytorch superimposes (directly sums) the gradients of all trainable parameters involved in the calculation of the loss with respect to the loss. So if we don't have the intention to superimpose the gradients, then we have to remove the previous gradients before backward(). And since we have passed all the parameters to be trained into the optimizer earlier, using zero_grad() on the optimizer will zero out any existing gradients in all the parameters to be trained. So when is gradient stacking used? For example, batch gradient descent, when the memory is not enough to directly calculate the gradient of the entire batch, we can only divide the batch into a part of a part of the calculation, every part of the calculation to get the loss on the backward () once, so as to get the gradient of the entire batch. After the gradient is calculated, the optimizer's step() is executed, and the optimizer performs a step of optimization based on the gradient of the trainable parameters.

#10: Use the torchsummary function to display the model structure. Strange why you don't inherit this inside torch and have to reinstall a torchsummary library.

2 Training and visualization

Next the model was trained using the model, since the MNIST dataset that comes with pytorch doesn't work well, I used the one that comes with Keras, defining a generator to get the data. Here is the full training and plotting code (50 iterations to record an accuracy):

import numpy as np
import  as plt 
import torch 
from torch import nn,optim 
from torchsummary import summary 
from  import mnist
from  import to_categorical
device = ('cuda') #——————1——————
 
class ModelTest():
 def __init__(self,device):
  super().__init__() 
  self.layer1 = ((),(28*28,512),())#——————2——————
  self.layer2 = ((512,512),()) 
  self.layer3 = ((512,512),())
  self.layer4 = ((512,10),()) 

  (device) #——————3——————
   = ((),lr=0.01)#——————4——————
 def forward(self,inputs): #——————5——————
  x = self.layer1(inputs)
  x = self.layer2(x)
  x = self.layer3(x)
  x = self.layer4(x)
  return x 
 def get_loss(self,true_labels,predicts): 
  loss = -true_labels * (predicts) #——————6——————
  loss = (loss)
  return loss
 def train(self,imgs,labels): 
  predicts = model(imgs) 
  loss = self.get_loss(labels,predicts)
  .zero_grad()#——————7——————
  ()#——————8——————
  ()#——————9——————
def get_data(device,is_train = True, batch = 1024, num = 10000):
 train_data,test_data = mnist.load_data()
 if is_train:
  imgs,labels = train_data
 else:
  imgs,labels = test_data 
 imgs = (imgs/255*2-1)[:,,...]
 labels = to_categorical(labels,10) 
 imgs = (imgs,dtype=torch.float32).to(device)
 labels = (labels,dtype=torch.float32).to(device)
 i = 0
 while(True):
  i += batch
  if i > num:
   i = batch 
  yield imgs[i-batch:i],labels[i-batch:i] 
train_dg = get_data(device, True,batch=4096,num=60000) 
test_dg = get_data(device, False,batch=5000,num=10000) 

model = ModelTest(device) 
summary(model,(1,28,28),11,device='cuda') 
ACCs = []
import time
start = ()
for j in range(20000):
 #Training
 imgs,labels = next(train_dg)
 (imgs,labels)

 #Verify
 img,label = next(test_dg)
 predicts = model(img) 
 acc = 1 - torch.count_nonzero((predicts,axis=1) - (label,axis=1))/[0]
 if j % 50 == 0:
  t = () - start
  start = ()
  (().numpy())
  print(j,t,'ACC: ',acc)
# Drawing
x = (0,len(ACCs),len(ACCs))
(x,ACCs)

A graph of the change in accuracy is shown below:

3 Cautions

Note that pytorch's tensor is based on numpy's array, and they share memory. That is to say, if you insert a tensor into a list, when you modify the tensor, the tensor in the list will also be modified; what is even easier to ignore is that even if you use () to convert the tensor to an array before inserting it into the list, when you modify the original tensor, the array in the list will still be modified. When you modify the original tensor, this array in the list will still be modified. So if we just want to save the value of the tensor instead of the whole object, we have to copy the value of the tensor using (tensor).

This is how to use Pytorch to build a model of the details, more information about Pytorch to build a model of the information please pay attention to my other related articles!