The directly built network must have exactly the same structure, dimensions and variable naming as the weights of the network that comes with torchvision, i.e. the pth file, otherwise the weights file cannot be loaded.
At this point, you can compare the 2 dictionaries and load them one by one, see
Solution for pytorch loading a pre-trained model that doesn't match its own model
import torch import torchvision import cv2 as cv from import letter_box from import ResNet18 model1 = ResNet18(1) model2 = .resnet18(progress=False) fc = = (512, 1) # print(model) model_dict1 = model1.state_dict() model_dict2 = ('') model_list1 = list(model_dict1.keys()) model_list2 = list(model_dict2.keys()) len1 = len(model_list1) len2 = len(model_list2) minlen = min(len1, len2) for n in range(minlen): if model_dict1[model_list1[n]].shape != model_dict2[model_list2[n]].shape: continue model_dict1[model_list1[n]] = model_dict2[model_list2[n]] model1.load_state_dict(model_dict1) missing, unspected = model2.load_state_dict(model_dict2) image = ('') image = letter_box(image, 224) image = image[:, :, ::-1].transpose(2, 0, 1) print('Network loading complete.') () () with torch.no_grad(): image = (image/256, dtype=torch.float32).unsqueeze(0) predict1 = model1(image) predict2 = model2(image) print('finished') # (model.state_dict(), '')
The above is the full procedure, which ultimately allows you to test whether the output of the original model is equal to the output of the custom model loaded with its own weights.
Supplementary: building a ResNet classification network using Pytorch and training it based on transfer learning
If stride=1, padding=1
Convolutional processing is not changing the height and width of the feature matrix
When using the BN layer
The parameter bias in the convolution is set to False (the output of the BN layer is the same with or without bias), and the BN layer is placed in the middle of the conv and relu layers
Review the BN tier:
Batch Norm layers are normalized for each layer and then linearly transformed to improve the data distribution, where the linear transformation is learnable.
Batch Norm Advantages:Mitigates overfitting; improves gradient propagation (weights are not too high or too low) Allows for higher learning rates and can increase training speed. Reduce the strong dependence on the initialization weights, so that the data is distributed in the non-saturated region of the activation function, to a certain extent, to solve the problem of gradient disappearance. Acts as a form of regularization, reducing the use of dropout to some extent.
Batch Norm Layer Placement: There is no standardization as to whether it should be placed before or after the activation layer (e.g. ReLU).
BN layer cooperates with Dropout: the proposal of Batch Norm makes the use of dropout reduce, but Batch Norm can't replace dropout completely, keep a smaller dropout rate, such as 0.2 may be more effective.
Why do you need to normalize first and then restore close to the original by γ,β linear transformation, isn't it redundant?
Under certain conditions it is possible to correct the distribution of the original data (variance, mean become new values γ,β), when the distribution of the original data is good enough it is a constant mapping and does not change the distribution. If no BN is done, the variance and mean have complex correlation dependence on the parameters of the previous network with complex nonlinearities. In the new parameter γH′ + β is determined only by γ,β, which is independent of the parameters of the preceding network, so the new parameter is easy to learn by gradient descent, and is able to learn a better distribution.
Transfer learning import weights and download weights:
import #ctrl+left mouse click to download weights net = resnet34()#You can't set the output type of the fully-connected layer to what you want at first, you have to load the model parameters first and then modify the fully-connected layer. # Official method for loading pre-trained models model_weight_path = "./"#Weighting paths missing_keys, unexpected_keys = net.load_state_dict((model_weight_path), strict=False)#Load model weights inchannel = .in_features = (inchannel, 5)# Redefine the full connectivity layer
Full Code:
MODEL section:
import as nn import torch class BasicBlock():# Residual structures corresponding to the 18th and 34th floors (both solid and dashed residual structure functions) expansion = 1Are the three convolutional layers on the main branch of the # residual structure the same, 1 for the same, 4 if the third layer is four times as large as one or two layers def __init__(self, in_channel, out_channel, stride=1, downsample=None):#downsample represents the dashed residual structure option super(BasicBlock, self).__init__() self.conv1 = nn.Conv2d(in_channels=in_channel, out_channels=out_channel, kernel_size=3, stride=stride, padding=1, bias=False) self.bn1 = nn.BatchNorm2d(out_channel) = () self.conv2 = nn.Conv2d(in_channels=out_channel, out_channels=out_channel, kernel_size=3, stride=1, padding=1, bias=False) self.bn2 = nn.BatchNorm2d(out_channel) = downsample def forward(self, x): identity = x if is not None: identity = (x)# Get the output of the shortcut branch out = self.conv1(x) out = self.bn1(out) out = (out) out = self.conv2(out) out = self.bn2(out) out += identity out = (out) return out# Get the final output of the residual structure class Bottleneck():# Residual structures corresponding to layers 50, 101 and 152 expansion = 4# The third layer has four times as many convolution kernels as the first and second layers. def __init__(self, in_channel, out_channel, stride=1, downsample=None): super(Bottleneck, self).__init__() self.conv1 = nn.Conv2d(in_channels=in_channel, out_channels=out_channel, kernel_size=1, stride=1, bias=False) self.bn1 = nn.BatchNorm2d(out_channel) self.conv2 = nn.Conv2d(in_channels=out_channel, out_channels=out_channel, kernel_size=3, stride=stride, bias=False, padding=1) self.bn2 = nn.BatchNorm2d(out_channel) self.conv3 = nn.Conv2d(in_channels=out_channel, out_channels=out_channel*, kernel_size=1, stride=1, bias=False) self.bn3 = nn.BatchNorm2d(out_channel*) = (inplace=True) = downsample def forward(self, x): identity = x if is not None: identity = (x) out = self.conv1(x) out = self.bn1(out) out = (out) out = self.conv2(out) out = self.bn2(out) out = (out) out = self.conv3(out) out = self.bn3(out) out += identity out = (out) return out class ResNet():# Define the framework part of the whole network #blocks_num is the number of residual structures, it is a list parameter, blocks correspond to which residual module def __init__(self, block, blocks_num, num_classes=1000, include_top=True): super(ResNet, self).__init__() self.include_top = include_top self.in_channel = 64# Depth of the feature matrix obtained after passing through the first pooling layer self.conv1 = nn.Conv2d(3, self.in_channel, kernel_size=7, stride=2, padding=3, bias=False) self.bn1 = nn.BatchNorm2d(self.in_channel) = (inplace=True) = nn.MaxPool2d(kernel_size=3, stride=2, padding=1) self.layer1 = self._make_layer(block, 64, blocks_num[0]) self.layer2 = self._make_layer(block, 128, blocks_num[1], stride=2) self.layer3 = self._make_layer(block, 256, blocks_num[2], stride=2) self.layer4 = self._make_layer(block, 512, blocks_num[3], stride=2) if self.include_top: = nn.AdaptiveAvgPool2d((1, 1)) # output size = (1, 1) = (512 * , num_classes) for m in (): if isinstance(m, nn.Conv2d): .kaiming_normal_(, mode='fan_out', nonlinearity='relu') def _make_layer(self, block, channel, block_num, stride=1):#channel: number of convolution kernels used in the first convolutional layer of the residual structure downsample = None if stride != 1 or self.in_channel != channel * :#Levels 18 and 34 will just skip this if statement downsample = ( nn.Conv2d(self.in_channel, channel * , kernel_size=1, stride=stride, bias=False), nn.BatchNorm2d(channel * )) layers = [] (block(self.in_channel, channel, downsample=downsample, stride=stride)) self.in_channel = channel * for _ in range(1, block_num): (block(self.in_channel, channel)) return (*layers) def forward(self, x): x = self.conv1(x) x = self.bn1(x) x = (x) x = (x) x = self.layer1(x) x = self.layer2(x) x = self.layer3(x) x = self.layer4(x) if self.include_top:# The default is true x = (x) x = (x, 1) x = (x) return x def resnet34(num_classes=1000, include_top=True): return ResNet(BasicBlock, [3, 4, 6, 3], num_classes=num_classes, include_top=include_top) def resnet101(num_classes=1000, include_top=True): return ResNet(Bottleneck, [3, 4, 23, 3], num_classes=num_classes, include_top=include_top)
Training section:
import torch import as nn from torchvision import transforms, datasets import json import as plt import os import as optim from model import resnet34, resnet101 import #ctrl+left mouse click to download weights device = ("cuda:0" if .is_available() else "cpu") print(device) data_transform = { "train": ([(224), (), (), ([0.485, 0.456, 0.406], [0.229, 0.224, 0.225])]),#Stay consistent with the official website's initialization method "val": ([(256), (224), (), ([0.485, 0.456, 0.406], [0.229, 0.224, 0.225])])} data_root = (((), "../..")) # get data root path image_path = data_root + "/data_set/flower_data/" # flower data set path train_dataset = (root=image_path+"train", transform=data_transform["train"]) train_num = len(train_dataset) # {'daisy':0, 'dandelion':1, 'roses':2, 'sunflower':3, 'tulips':4} flower_list = train_dataset.class_to_idx cla_dict = dict((val, key) for key, val in flower_list.items()) # write dict into json file json_str = (cla_dict, indent=4) with open('class_indices.json', 'w') as json_file: json_file.write(json_str) batch_size = 16 train_loader = (train_dataset, batch_size=batch_size, shuffle=True, num_workers=0) validate_dataset = (root=image_path + "val", transform=data_transform["val"]) val_num = len(validate_dataset) validate_loader = (validate_dataset, batch_size=batch_size, shuffle=False, num_workers=0) net = resnet34()#You can't set the output type of the fully-connected layer to what you want at first, you have to load the model parameters first and then modify the fully-connected layer. # Official method for loading pre-trained models model_weight_path = "./"#Weighting paths missing_keys, unexpected_keys = net.load_state_dict((model_weight_path), strict=False)#Load model weights inchannel = .in_features = (inchannel, 5)# Redefine the full connectivity layer (device) loss_function = () optimizer = ((), lr=0.0001) best_acc = 0.0 save_path = './' for epoch in range(3): # train ()# Control BN layer status running_loss = 0.0 for step, data in enumerate(train_loader, start=0): images, labels = data optimizer.zero_grad() logits = net((device)) loss = loss_function(logits, (device)) () () # print statistics running_loss += () # print train process rate = (step+1)/len(train_loader) a = "*" * int(rate * 50) b = "." * int((1 - rate) * 50) print("\rtrain loss: {:^3.0f}%[{}->{}]{:.4f}".format(int(rate*100), a, b, loss), end="") print() # validate ()# Control BN layer status acc = 0.0 # accumulate accurate number / epoch with torch.no_grad(): for val_data in validate_loader: val_images, val_labels = val_data outputs = net(val_images.to(device)) # eval model only have last output layer # loss = loss_function(outputs, test_labels) predict_y = (outputs, dim=1)[1] acc += (predict_y == val_labels.to(device)).sum().item() val_accurate = acc / val_num if val_accurate > best_acc: best_acc = val_accurate (net.state_dict(), save_path) print('[epoch %d] train_loss: %.3f test_accuracy: %.3f' % (epoch + 1, running_loss / step, val_accurate)) print('Finished Training')
Prediction section:
import torch from model import resnet34 from PIL import Image from torchvision import transforms import as plt import json device = ("cuda:0" if .is_available() else "cpu") data_transform = ( [(256), (224), (), ([0.485, 0.456, 0.406], [0.229, 0.224, 0.225])])# Use the same normalization as the training method # load image img = ("../") (img) # [N, C, H, W] img = data_transform(img) # expand batch dimension img = (img, dim=0) # read class_indict try: json_file = open('./class_indices.json', 'r') class_indict = (json_file) except Exception as e: print(e) exit(-1) # create model model = resnet34(num_classes=5) # load model weights model_weight_path = "./" model.load_state_dict((model_weight_path, map_location=device))#Load the parameters of the trained model ()# Use eval() mode with torch.no_grad():#Not tracking the loss gradient # predict class output = (model(img))# Compress the batch dimension predict = (output, dim=0)# Get the probability distribution via softmax predict_cla = (predict).numpy()# Find the index corresponding to the maximum value print(class_indict[str(predict_cla)], predict[predict_cla].numpy())#Print category information and probabilities ()
The above is a personal experience, I hope it can give you a reference, and I hope you can support me more.