Python hands-on implementation of LSTM module using pytorch

Introduction to LSTM:

LSTM is one of the more popular network modules in RNN. It mainly consists of inputs, input gates, output gates, oblivion gates, activation functions, fully connected layers (Cell) and outputs.

It is structured as follows:

Without explaining the above formula, we just need to remember the following points roughly:

The inputs to the LSTM module at the current moment have the input value from the current moment, the output value from the previous moment, the input value and the implicit layer output value, that is, there are four input values in total, which means that the amount of inputs to an LSTM module is as much as that of the original ordinary fully connected layerAbout four times., there are a lot more calculations.
The so-called gate is the computed value of the previous moment input to the sigmoid activation function to obtain theA probability value that determines how strong or weak the current input is. This probability value is compared to the current input formatrix multiplicationGet the actual value after the gating process.
The activation functions of the gating are all sigmoid in the range (0, 1), while the activation functions of the output output units are all tanh in the range (-1, 1).

Pytorch is implemented as follows:

import torch
import  as nn
from  import Parameter
from  import init
from torch import Tensor
import math
class NaiveLSTM():
    """Naive LSTM like """
    def __init__(self, input_size: int, hidden_size: int):
        super(NaiveLSTM, self).__init__()
        self.input_size = input_size
        self.hidden_size = hidden_size

        # input gate
        self.w_ii = Parameter(Tensor(hidden_size, input_size))
        self.w_hi = Parameter(Tensor(hidden_size, hidden_size))
        self.b_ii = Parameter(Tensor(hidden_size, 1))
        self.b_hi = Parameter(Tensor(hidden_size, 1))

        # forget gate
        self.w_if = Parameter(Tensor(hidden_size, input_size))
        self.w_hf = Parameter(Tensor(hidden_size, hidden_size))
        self.b_if = Parameter(Tensor(hidden_size, 1))
        self.b_hf = Parameter(Tensor(hidden_size, 1))

        # output gate
        self.w_io = Parameter(Tensor(hidden_size, input_size))
        self.w_ho = Parameter(Tensor(hidden_size, hidden_size))
        self.b_io = Parameter(Tensor(hidden_size, 1))
        self.b_ho = Parameter(Tensor(hidden_size, 1))

        # cell
        self.w_ig = Parameter(Tensor(hidden_size, input_size))
        self.w_hg = Parameter(Tensor(hidden_size, hidden_size))
        self.b_ig = Parameter(Tensor(hidden_size, 1))
        self.b_hg = Parameter(Tensor(hidden_size, 1))

        self.reset_weigths()

    def reset_weigths(self):
        """reset weights
        """
        stdv = 1.0 / (self.hidden_size)
        for weight in ():
            init.uniform_(weight, -stdv, stdv)

    def forward(self, inputs: Tensor, state: Tuple[Tensor]) \
        -> Tuple[Tensor, Tuple[Tensor, Tensor]]:
        """Forward
        Args:
            inputs: [1, 1, input_size]
            state: ([1, 1, hidden_size], [1, 1, hidden_size])
        """
#         seq_size, batch_size, _ = ()

        if state is None:
            h_t = (1, self.hidden_size).t()
            c_t = (1, self.hidden_size).t()
        else:
            (h, c) = state
            h_t = (0).t()
            c_t = (0).t()

        hidden_seq = []

        seq_size = 1
        for t in range(seq_size):
            x = inputs[:, t, :].t()
            # input gate
            i = (self.w_ii @ x + self.b_ii + self.w_hi @ h_t +
                              self.b_hi)
            # forget gate
            f = (self.w_if @ x + self.b_if + self.w_hf @ h_t +
                              self.b_hf)
            # cell
            g = (self.w_ig @ x + self.b_ig + self.w_hg @ h_t
                           + self.b_hg)
            # output gate
            o = (self.w_io @ x + self.b_io + self.w_ho @ h_t +
                              self.b_ho)

            c_next = f * c_t + i * g
            h_next = o * (c_next)
            c_next_t = c_next.t().unsqueeze(0)
            h_next_t = h_next.t().unsqueeze(0)
            hidden_seq.append(h_next_t)

        hidden_seq = (hidden_seq, dim=0)
        return hidden_seq, (h_next_t, c_next_t)

def reset_weigths(model):
    """reset weights
    """
    for weight in ():
        init.constant_(weight, 0.5)
### test 
inputs = (1, 1, 10)
h0 = (1, 1, 20)
c0 = (1, 1, 20)
print(, h0)
print(, c0)
print(, inputs)
# test naive_lstm with input_size=10, hidden_size=20
naive_lstm = NaiveLSTM(10, 20)
reset_weigths(naive_lstm)
output1, (hn1, cn1) = naive_lstm(inputs, (h0, c0))
print(, , )
print(hn1)
print(cn1)
print(output1)

Compare official realizations:

# Use official lstm with input_size=10, hidden_size=20
lstm = (10, 20)
reset_weigths(lstm)
output2, (hn2, cn2) = lstm(inputs, (h0, c0))
print(, , )
print(hn2)
print(cn2)
print(output2)

You can see that there is a slight difference with the official implementation, but the output is still the same.

To this article on Python using pytorch hands-on implementation of the LSTM module is introduced to this article, more related Python implementation of the LSTM module content, please search for my previous articles or continue to browse the following related articles I hope you will support me more in the future!