Pytorch学习笔记——nn.RNN()

本文详细介绍了PyTorch中nn.RNN类的使用,包括参数解释、输入输出形状以及计算过程。nn.RNN主要参数为input_size和hidden_size,用于构建基于序列的循环神经网络。在前向传播后,返回各时间步的隐藏状态输出和最后时间步的隐藏状态。通过实例展示了维度变化,并建议查阅源代码以进一步理解。
Qwen3-32B-Chat 私有部署镜像 | RTX4090D 24G 显存 CUDA12.4 优化版

本镜像基于 RTX 4090D 24GB 显存 + CUDA 12.4 + 驱动 550.90.07 深度优化,内置完整运行环境与 Qwen3-32B 模型依赖,开箱即用。

pytorch 中使用 nn.RNN 类来搭建基于序列的循环神经网络,其构造函数如下:
nn.RNN(input_size, hidden_size, num_layers=1, nonlinearity=tanh, bias=True, batch_first=False, dropout=0, bidirectional=False)

  1. RNN的结构如下:
    在这里插入图片描述
    RNN 可以被看做是同一神经网络的多次赋值,每个神经网络模块会把消息传递给下一个,我们将这个图的结构展开
    在这里插入图片描述
  2. 参数解释如下:
  • input_size:The number of expected features in the input x,即输入特征的维度, 一般rnn中输入的是词向量,那么 input_size 就等于一个词向量的维度。
  • hidden_size:The number of features in the hidden state h,即隐藏层神经元个数,或者也叫输出的维度(因为rnn输出为各个时间步上的隐藏状态)。
  • num_layers:Number of recurrent layers. E.g., setting num_layers=2 would mean stacking two RNNs together to form a stacked RNN,with the second RNN taking in outputs of the first RNN and computing the final results. Default: 1
    即网络的层数。
  • nonlinearity:The non-linearity to use. Can be either 'tanh' or 'relu'. Default: 'tanh',即激活函数。
  • bias:If False, then the layer does not use bias weights b_ih and b_hh. Default: True,即是否使用偏置。
  • batch_first:If True, then the input and output tensors are provided as (batch, seq, feature). Default: False,即输入数据的形式,默认是 False,如果设置成True,则格式为(seq(num_step), batch, input_dim),也就是将序列长度放在第一位,batch 放在第二位。
  • dropout:If non-zero, introduces a Dropout layer on the outputs of each RNN layer except the last layer, with dropout probability equal to :attr:dropout. Default: 0,即是否应用dropout, 默认不使用,如若使用将其设置成一个0-1的数字即可。
  • bidirectional:If True, becomes a bidirectional RNN. Default: False,是否使用双向的 rnn,默认是 False。

nn.RNN() 中最主要的参数是 input_sizehidden_size,这两个参数务必要搞清楚。其余的参数通常不用设置,采用默认值就可以了。

  1. RNN输入输出的shape
  • Inputs: input, h_0
    - input of shape (seq_len, batch, input_size): tensor containing the features
    of the input sequence. The input can also be a packed variable length
    sequence. See :func:torch.nn.utils.rnn.pack_padded_sequence
    or :func:torch.nn.utils.rnn.pack_sequence
    for details.
    - h_0 of shape (num_layers * num_directions, batch, hidden_size): tensor
    containing the initial hidden state for each element in the batch.
    Defaults to zero if not provided. If the RNN is bidirectional,
    num_directions should be 2, else it should be 1.

  • Outputs: output, h_n
    - output of shape (seq_len, batch, num_directions * hidden_size): tensor containing the output features (h_t) from the last layer of the RNN,
    - for each t. If a :class:torch.nn.utils.rnn.PackedSequence has
    been given as the input, the output will also be a packed sequence.
    For the unpacked case, the directions can be separated
    using output.view(seq_len, batch, num_directions, hidden_size),with forward and backward being direction 0 and 1 respectively.
    Similarly, the directions can be separated in the packed case.
    - h_n of shape (num_layers * num_directions, batch, hidden_size): tensor containing the hidden state for t = seq_len.
    Like output, the layers can be separated using
    h_n.view(num_layers, num_directions, batch, hidden_size).

  • Shape:
    - Input1: :math:(L,N,Hin)(L, N, H_{in})(L,N,Hin) tensor containing input features where
    :math: Hin=input_sizeH_{in}=\text{input\_size}Hin=input_size and L represents a sequence length.
    - Input2: :math:(S,N,Hout)(S, N, H_{out})(S,N,Hout) tensor
    containing the initial hidden state for each element in the batch.
    :math:Hout=hidden_sizeH_{out}=\text{hidden\_size}Hout=hidden_size
    Defaults to zero if not provided. where :math:S=num_layers∗num_directionsS=\text{num\_layers} * \text{num\_directions}S=num_layersnum_directions
    If the RNN is bidirectional, num_directions should be 2, else it should be 1.
    - Output1: :math:(L,N,Hall)(L, N, H_{all})(L,N,Hall) where :math:Hall=num_directions∗hidden_sizeH_{all}=\text{num\_directions} * \text{hidden\_size}Hall=num_directionshidden_size
    - Output2: :math:(S,N,Hout)(S, N, H_{out})(S,N,Hout) tensor containing the next hidden state for each element in the batch

输入shape :input_shape = [时间步数, 批量大小, 特征维度] =[num_steps(seq_length), batch_size, input_size]
在前向计算后会分别返回输出ooo和隐藏状态hhh,其中输出ooo指的是隐藏层在各个时间步上计算并输出的隐藏状态,它们通常作为后续输出层的输⼊。需要强调的是,该“输出”本身并不涉及输出层计算,形状为output_shape = [时间步数, 批量大小, 隐藏单元个数]=[num_steps(seq_length), batch_size, hidden_size]隐藏状态指的是隐藏层在最后时间步的隐藏状态:当隐藏层有多层时,每⼀层的隐藏状态都会记录在该变量中;对于像⻓短期记忆(LSTM),隐藏状态是⼀个元组(h,c)(h, c)(h,c),即hidden state和cell state(此处普通rnn只有一个值),隐藏状态hhh的形状为hidden_shape = [层数, 批量大小,隐藏单元个数] = [num_layers, batch_size, hidden_size]

代码

rnn_layer = nn.RNN(input_size=vocab_size, hidden_size=num_hiddens, )

定义模型, 其中vocab_size = 1027, hidden_size = 256

num_steps = 35
batch_size = 2
state = None    # 初始隐藏层状态可以不定义
X = torch.rand(num_steps, batch_size, vocab_size)
Y, state_new = rnn_layer(X, state)
print(Y.shape, len(state_new), state_new.shape)

输出

torch.Size([35, 2, 256])     1       torch.Size([1, 2, 256])

具体计算过程为:
Ht=input∗Wxh+Ht−1∗Whh+biasH_t = input * W_{x_h} + H_{t-1} * W_{h_h} + biasHt=inputWxh+Ht1Whh+bias
为了便于观察,假设num_step=1,维度变化过程如下:
[batch_size, input_size] * [input_size, hidden_size] + [batch_size, hidden_size] *[hidden_size, hidden_size] +bias
可以发现每个隐藏状态形状都是[batch_size, hidden_size], 起始输出也是一样的。

另外,可以通过查看源代码rnn.py文件来分析:

参考链接:https://blog.csdn.net/orangerfun/article/details/103934290

您可能感兴趣的与本文相关的镜像

Qwen3-32B-Chat 私有部署镜像 | RTX4090D 24G 显存 CUDA12.4 优化版

Qwen3-32B-Chat 私有部署镜像 | RTX4090D 24G 显存 CUDA12.4 优化版

Qwen
文本生成
Qwen3

本镜像基于 RTX 4090D 24GB 显存 + CUDA 12.4 + 驱动 550.90.07 深度优化,内置完整运行环境与 Qwen3-32B 模型依赖,开箱即用。

评论 2
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值