SoFunction
Updated on 2024-10-30

Pytorch DataLoader shuffle validation method

When shuffle = False, the data order is not disturbed.

shuffle = True, randomized shuffle

import numpy as np
import h5py
import torch
from  import DataLoader, Dataset  
h5f = ('train.h5', 'w');
data1 = ([[1,2,3],
               [2,5,6],
              [3,5,6],
              [4,5,6]])
data2 = ([[1,1,1],
                   [1,2,6],
                  [1,3,6],
                  [1,4,6]])
h5f.create_dataset(str('data'), data=data1)
h5f.create_dataset(str('label'), data=data2)
class Dataset(Dataset):
    def __init__(self):
        h5f = ('train.h5', 'r')
         = h5f['data']
         = h5f['label']
    def __getitem__(self, index):
        data = torch.from_numpy([index])
        label = torch.from_numpy([index])
        return data, label
 
    def __len__(self):
        assert [0] == [0], "wrong data length"
        return [0] 
 
dataset_train = Dataset()
loader_train = DataLoader(dataset=dataset_train,
                           batch_size=2,
                           shuffle = True)
 
for i, data in enumerate(loader_train):
    train_data, label = data
    print(train_data)
 

pytorch DataLoader usage details

Background:

I at first was on the data amplification of this piece of doubt, only saw the data transformation (), but did not see the data amplification, and then figured out, the data amplification in pytorch refers to + + multiple epochs under the joint action of the completion of the

There are a total of the following data transformations

composed = ([((448, 448)), #  resize
                               (300), # random crop
                               (),
                               (mean=[0.5, 0.5, 0.5],  # normalize
                                                    std=[0.5, 0.5, 0.5])])

Simple data-reading class that returns an image in PIL format.

class MyDataset():    
    def __init__(self, labels_file, root_dir, transform=None):
        with open(labels_file) as csvfile:
            self.labels_file = list((csvfile))
        self.root_dir = root_dir
         = transform
        
    def __len__(self):
        return len(self.labels_file)
    
    def __getitem__(self, idx):
        im_name = (root_dir, self.labels_file[idx][0])
        im = (im_name)
        
        if :
            im = (im)
            
        return im

Here is the main program

labels_file = "F:/test_temp/"
root_dir = "F:/test_temp"
dataset_transform = MyDataset(labels_file, root_dir, transform=composed)
dataloader = (dataset_transform, batch_size=1, shuffle=False)
"""The original dataset totaled3Pictures, in order tobatch_size=1, epochbecause of2 Show all pictures(common6sheet of paper)  """
for eopch in range(2):
    (figsize=(6, 6)) 
    for ind, i in enumerate(dataloader):
        a = i[0, :, :, :].numpy().transpose((1, 2, 0))
        (1, 3, ind+1)
        (a)

As you can see from the above images, the original image is actually re-transformed at each eopch stage, which results in an augmentation of the data.

The above is a personal experience, I hope it can give you a reference, and I hope you can support me more.