Summary in pytorch

(input_size, hidden_size, num_layers=1, nonlinearity=tanh, bias=True, batch_first=False, dropout=0, bidirectional=False)

Parameter description

input_size input feature dimensions. Generally, input is a word vector in rnn, so input_size is equal to the dimension of a word vector.
hidden_size number of hidden layer neurons, or also called the output dimension (because rnn output is a hidden state at each time step)
The number of layers of the num_layers network
nonlinearity activation function
Whether bias is used
The form of batch_first input data is False by default, which is the form (seq(num_step), batch, input_dim), that is, the sequence length is placed first and batch is placed second
Whether dropout is applied to dropout is not used by default. If you use it, set it to a number of 0-1.
Whether birddirectional uses bidirectional rnn, default is False
Note that the default values of some parameters are indicated in the title

Input and output shape

input_shape = [Time steps, batch size, feature dimension] = [num_steps(seq_length), batch_size, input_dim]
After the forward calculation, the output and hidden state h are returned respectively, where the output refers to the hidden states calculated and output by the hidden layer at each time step, which are usually used as inputs to subsequent output layers. It should be emphasized that the "output" itself does not involve the calculation of the output layer, and its shape is (time steps, batch size, number of hidden units); the hidden state refers to the hidden state of the hidden layer at the last time step: when there are multiple layers of the hidden layer, the hidden state of each layer will be recorded in the variable; for example, short-term memory (LSTM), the hidden state is a tuple (h, c), that is, hidden state and cell state (there is only one value in ordinary rnn). The shape of the hidden state h is (layers, batch size, number of hidden units)

Code

rnn_layer = (input_size=vocab_size, hidden_size=num_hiddens, )
# Define the model， invocab_size = 1027, hidden_size = 256

num_steps = 35
batch_size = 2
state = None    # The initial hidden layer state may not be definedX = (num_steps, batch_size, vocab_size)
Y, state_new = rnn_layer(X, state)
print(, len(state_new), state_new.shape)

Output

([35, 2, 256]) 1 ([1, 2, 256])

Specific calculation process
H t = i n p u t ∗ W x h + H t − 1 ∗ W h h + b i a s H_t = input * W_{xh} + H_{t-1} * W_{hh} + bias Ht=input∗Wxh+Ht−1∗Whh+bias
[batch_size, input_dim] * [input_dim, num_hiddens] + [batch_size, num_hiddens] *[num_hiddens, num_hiddens] +bias
You can find that each hidden state shape is [batch_size, num_hiddens], and the starting output is the same
Note: For convenience above, assuming num_step=1

GRU/LSTM and other parameters are the same as RNN above

This is the end of this article about the summary of pytorch () in pytorch. For more related pytorch () content, please search for my previous articles or continue browsing the related articles below. I hope everyone will support me in the future!