Day07【手动实现gru网络结构】

原创已于 2025-04-11 15:16:05 修改 · 1k 阅读

10 ·

本内容遵循CC 4.0 BY-SA版权协议

GEO检测

收录于

自然语言处理

于 2025-04-08 19:16:50 首次发布

手动实现gru网络结构

目标

通过使用numpy库手动实现PyTorch的GRU门控循环单元网络的计算过程。GRU是一种常用的循环神经网络（RNN）变种，主要用于处理时间序列数据。代码的核心目标是通过将PyTorch的网络权重提取出来，并用NumPy通过矩阵运算来复现GRU的计算过程。

以下是详细解释：

核心计算公式

$\begin{array}{ll} r_t = \sigma(W_{ir} x_t + b_{ir} + W_{hr} h_{(t-1)} + b_{hr}) \\ z_t = \sigma(W_{iz} x_t + b_{iz} + W_{hz} h_{(t-1)} + b_{hz}) \\ n_t = \tanh(W_{in} x_t + b_{in} + r_t * (W_{hn} h_{(t-1)}+ b_{hn})) \\ h_t = (1 - z_t) * n_t + z_t * h_{(t-1)} \end{array}$

原论文地址gru论文

1. 导入必要的库

import torch
import torch.nn as nn
import numpy as np

torch 是PyTorch的核心库，用于构建神经网络模型。
torch.nn 包含了各种神经网络层及工具，nn.GRU 是用于定义GRU层的类。
numpy 是Python的数值计算库，用于进行高效的矩阵运算。

2. sigmoid 函数

def sigmoid(x):
    return 1/(1 + np.exp(-x))

这是一个sigmoid激活函数的实现。Sigmoid函数常用于RNN中来计算“门”的激活值，将输入映射到[0, 1]的范围内。

3. numpy_gru 函数

def numpy_gru(x, state_dict):

这个函数模拟GRU单元的计算过程。输入参数 x 是输入序列（长度为length，每个时间步的输入维度为input_dim），state_dict 是PyTorch GRU网络的权重和偏置字典。

3.1. 提取权重和偏置

weight_ih = state_dict["weight_ih_l0"].numpy()
weight_hh = state_dict["weight_hh_l0"].numpy()
bias_ih = state_dict["bias_ih_l0"].numpy()
bias_hh = state_dict["bias_hh_l0"].numpy()

weight_ih：输入到隐藏状态的权重（大小为 3*hidden_size x input_dim），包含三个门的权重（重置门、更新门和候选隐藏状态）。
weight_hh：隐藏状态到隐藏状态的权重（大小为 3*hidden_size x hidden_size）。
bias_ih 和 bias_hh 分别是对应的偏置。

3.2. 拆分权重和偏置

由于PyTorch将GRU的三个门（重置门、更新门、候选隐藏状态）拼接在一个矩阵中，下面将将它们拆开。

w_r_x, w_z_x, w_x = weight_ih[0:hidden_size, :], \
                        weight_ih[hidden_size:hidden_size * 2, :],\
                        weight_ih[hidden_size * 2:hidden_size * 3, :]
w_r_h, w_z_h, w_h = weight_hh[0:hidden_size, :], \
                        weight_hh[hidden_size:hidden_size * 2, :], \
                        weight_hh[hidden_size * 2:hidden_size * 3, :]
b_r_x, b_z_x, b_x = bias_ih[0:hidden_size], \
                        bias_ih[hidden_size:hidden_size * 2], \
                        bias_ih[hidden_size * 2:hidden_size * 3]
b_r_h, b_z_h, b_h = bias_hh[0:hidden_size], \
                        bias_hh[hidden_size:hidden_size * 2], \
                        bias_hh[hidden_size * 2:hidden_size * 3]

w_r_x, w_z_x, w_x 分别表示输入到重置门、更新门和候选隐藏状态的权重。
w_r_h, w_z_h, w_h 分别表示隐藏状态到重置门、更新门和候选隐藏状态的权重。
b_r_x, b_z_x, b_x 等是对应的偏置。

3.3. 拼接和计算门的权重与偏置

w_z = np.concatenate([w_z_h, w_z_x], axis=1)
w_r = np.concatenate([w_r_h, w_r_x], axis=1)
b_z = b_z_h + b_z_x
b_r = b_r_h + b_r_x

w_z 和 w_r 是更新门和重置门的权重，它们分别是隐藏状态和输入的拼接。
b_z 和 b_r 是更新门和重置门的偏置，它们也需要拼接。

3.4. 初始化隐藏状态

h_t = np.zeros((1, hidden_size))

初始化隐藏状态 h_t 为零矩阵，大小为 (1, hidden_size)。

3.5. 循环计算每个时间步的输出

for x_t in x:
    x_t = x_t[np.newaxis, :]
    hx = np.concatenate([h_t, x_t], axis=1)
    z_t = sigmoid(np.dot(hx, w_z.T) + b_z)
    r_t = sigmoid(np.dot(hx, w_r.T) + b_r)
    h = np.tanh(r_t * (np.dot(h_t, w_h.T) + b_h) + np.dot(x_t, w_x.T) + b_x)
    h_t = (1 - z_t) * h + z_t * h_t
    sequence_output.append(h_t)

对于每个时间步的输入 x_t，首先将隐藏状态 h_t 和输入 x_t 拼接。
然后计算更新门 z_t 和重置门 r_t，这两个值通过 Sigmoid 函数得到。
计算候选隐藏状态 h，这个是基于重置门的加权输入和当前隐藏状态。
最后，更新隐藏状态 h_t，它是由 z_t 控制的，(1 - z_t) * h + z_t * h_t 表示“遗忘”与“更新”机制。

每个时间步的计算结果存储在 sequence_output 中。

4. 主程序

if __name__ == "__main__":
    # 构造一个输入
    length = 6
    input_dim = 12
    hidden_size = 7
    x = np.random.random((length, input_dim))
    
    # 使用pytorch的GRU层
    torch_gru = nn.GRU(input_dim, hidden_size, batch_first=True)
    torch_sequence_output, torch_h = torch_gru(torch.Tensor([x]))
    numpy_sequence_output, numpy_h = numpy_gru(x, torch_gru.state_dict())
    print(torch_sequence_output)
    print(numpy_sequence_output)
    print("--------")
    print(torch_h)
    print(numpy_h)

输入数据 x 的形状为 (length, input_dim)，表示6个时间步，每个时间步有12个特征。
创建一个 PyTorch GRU 层 torch_gru，并用它处理输入 x，得到 torch_sequence_output 和 torch_h（最终的隐藏状态）。
同时，通过 numpy_gru 使用 NumPy 实现相同的计算，得到 numpy_sequence_output 和 numpy_h。
最后，打印出 PyTorch 和 NumPy 的输出，以便比较它们是否相同。

下面是全部代码：


import torch
import torch.nn as nn
import numpy as np

'''
用矩阵运算的方式复现一些基础的模型结构
清楚模型的计算细节，有助于加深对于模型的理解，以及模型转换等工作
'''
def sigmoid(x):
    return 1/(1 + np.exp(-x))


#将pytorch的GRU网络权重拿出来，用numpy通过矩阵运算实现GRU的计算
def numpy_gru(x, state_dict):
    weight_ih = state_dict["weight_ih_l0"].numpy()
    weight_hh = state_dict["weight_hh_l0"].numpy()
    bias_ih = state_dict["bias_ih_l0"].numpy()
    bias_hh = state_dict["bias_hh_l0"].numpy()
    #pytorch将3个门的权重拼接存储，我们将它拆开
    w_r_x, w_z_x, w_x = weight_ih[0:hidden_size, :], \
                        weight_ih[hidden_size:hidden_size * 2, :],\
                        weight_ih[hidden_size * 2:hidden_size * 3, :]
    w_r_h, w_z_h, w_h = weight_hh[0:hidden_size, :], \
                        weight_hh[hidden_size:hidden_size * 2, :], \
                        weight_hh[hidden_size * 2:hidden_size * 3, :]
    b_r_x, b_z_x, b_x = bias_ih[0:hidden_size], \
                        bias_ih[hidden_size:hidden_size * 2], \
                        bias_ih[hidden_size * 2:hidden_size * 3]
    b_r_h, b_z_h, b_h = bias_hh[0:hidden_size], \
                        bias_hh[hidden_size:hidden_size * 2], \
                        bias_hh[hidden_size * 2:hidden_size * 3]
    w_z = np.concatenate([w_z_h, w_z_x], axis=1)
    w_r = np.concatenate([w_r_h, w_r_x], axis=1)
    b_z = b_z_h + b_z_x
    b_r = b_r_h + b_r_x
    h_t = np.zeros((1, hidden_size))
    sequence_output = []
    for x_t in x:
        x_t = x_t[np.newaxis, :]
        hx = np.concatenate([h_t, x_t], axis=1)
        z_t = sigmoid(np.dot(hx, w_z.T) + b_z)
        r_t = sigmoid(np.dot(hx, w_r.T) + b_r)
        h = np.tanh(r_t * (np.dot(h_t, w_h.T) + b_h) + np.dot(x_t, w_x.T) + b_x)
        h_t = (1 - z_t) * h + z_t * h_t
        sequence_output.append(h_t)
    return np.array(sequence_output), h_t


if __name__ == "__main__":
    # 构造一个输入
    length = 6
    input_dim = 12
    hidden_size = 7
    x = np.random.random((length, input_dim))
    # print(x)
    # 使用pytorch的GRU层
    torch_gru = nn.GRU(input_dim, hidden_size, batch_first=True)
    # for key, weight in torch_gru.state_dict().items():
    #     print(key, weight.shape)
    torch_sequence_output, torch_h = torch_gru(torch.Tensor([x]))
    numpy_sequence_output, numpy_h = numpy_gru(x, torch_gru.state_dict())
    print(torch_sequence_output)
    print(numpy_sequence_output)
    print("--------")
    print(torch_h)
    print(numpy_h)

输出结果对比：

tensor([[[ 0.0898,  0.1264, -0.3715, -0.0054, -0.1244,  0.2537,  0.1068],
         [ 0.2747,  0.1242, -0.4658, -0.0519, -0.2775,  0.3470,  0.3414],
         [ 0.1486,  0.3405, -0.4939, -0.0855, -0.1149,  0.3915,  0.3905],
         [ 0.1688,  0.2452, -0.4569,  0.1186, -0.3367,  0.3604,  0.2794],
         [ 0.2205,  0.2223, -0.2971,  0.0032, -0.3159,  0.3217,  0.3331],
         [ 0.3242,  0.1157, -0.1913, -0.1094, -0.4981,  0.4574,  0.4356]]],
       grad_fn=<TransposeBackward1>)
[[[ 0.0898004   0.12642607 -0.37150564 -0.00538398 -0.12441828
    0.25367489  0.10680404]]

 [[ 0.2746723   0.12416214 -0.46581426 -0.05186559 -0.27751937
    0.34703625  0.3414097 ]]

 [[ 0.1485713   0.34053725 -0.49385681 -0.0855477  -0.11494202
    0.39153359  0.39050793]]

 [[ 0.16876234  0.2452394  -0.45690767  0.11860361 -0.33673728
    0.36042564  0.27941383]]

 [[ 0.22054224  0.2223021  -0.29708877  0.00323797 -0.31586235
    0.32166219  0.33313443]]

 [[ 0.32415161  0.11565356 -0.1913359  -0.10939028 -0.49810386
    0.45740738  0.43556429]]]
--------
tensor([[[ 0.3242,  0.1157, -0.1913, -0.1094, -0.4981,  0.4574,  0.4356]]],
       grad_fn=<StackBackward0>)
[[ 0.32415161  0.11565356 -0.1913359  -0.10939028 -0.49810386  0.45740738
   0.43556429]]

进程已结束，退出代码为 0