diff --git a/docs/api/paddle/jit/to_static_cn.rst b/docs/api/paddle/jit/to_static_cn.rst index 4296d855e4d..c92fc2c2b5c 100644 --- a/docs/api/paddle/jit/to_static_cn.rst +++ b/docs/api/paddle/jit/to_static_cn.rst @@ -7,17 +7,17 @@ to_static 本装饰器将函数内的动态图 API 转化为静态图 API。此装饰器自动处理静态图模式下的 Program 和 Executor,并将结果作为动态图 Tensor 返回。输出的动态图 Tensor 可以继续进行动态图训练、预测或其他运算。如果被装饰的函数里面调用其他动态图函数,被调用的函数也会被转化为静态图函数。 + 参数 :::::::::::: - **function** (callable) - 待转换的动态图函数。若以装饰器形式使用,则被装饰函数默认会被解析为此参数值,无需显式指定。 - **input_spec** (list[InputSpec]|tuple[InputSpec]) - 用于指定被装饰函数中输入 Tensor 的 shape、dtype 和 name 信息,为包含 InputSpec 的 list/tuple 类型。 - - **build_strategy** (BuildStrategy|None):通过配置 :attr:`build_strategy`,对转换后的计算图进行优化,例如:计算图中算子融合、计算图执行过程中开启内存/显存优化等。关于 :attr:`build_strategy` 更多信息,请参阅 :ref:`paddle.static.BuildStrategy `。默认为 ``None``。 - - **backend** (str,可选): 指定后端编译器,可以指定为 ``"CINN"`` 或者 ``None``。当该参数指定为 ``"CINN"`` 时,将会使用 ``CINN`` 编译器来加速训练和推理。 - - **kwargs**: 支持的 key 包括 :attr:`property` 和 :attr:`full_graph`。 + - **build_strategy** (BuildStrategy|None):通过配置 build_strategy,对转换后的计算图进行优化,例如:计算图中算子融合、计算图执行过程中开启内存/显存优化等。关于 build_strategy 更多信息,请参阅 ``paddle.static.BuildStrategy``。默认为 None。 + - **backend** (str,可选): 指定后端编译器,可以指定为 `CINN` 或者 None。当该参数指定为 `CINN` 时,将会使用 CINN 编译器来加速训练和推理。 + - **kwargs**: 支持的 key 包括 `property` - - **property** (bool): 表示被装饰的函数是否以 class property 属性的方式进行导出,默认为 ``False``。 - - **full_graph** (bool): 表示被装饰的函数是否以整图静态图的方式进行导出,默认为 ``False``。 + - **property**: 表示被装饰的函数是否以 class property 属性的方式进行导出 代码示例 diff --git a/docs/guides/infer/index_cn.rst b/docs/guides/infer/index_cn.rst index ac8e8db2502..0b4338bff35 100644 --- a/docs/guides/infer/index_cn.rst +++ b/docs/guides/infer/index_cn.rst @@ -15,9 +15,9 @@ :header: "名称", "英文表示", "适用场景" :widths: 10, 10, 30 - "飞桨原生推理库", "`Paddle Inference `_ ", "高性能服务器端、云端推理" + "飞桨原生推理库", "`Paddle Inference `_ ", "高性能服务器端、云端推理" "飞桨服务化推理框架", "Paddle Serving", "自动服务、模型管理等高阶功能" - "飞桨轻量化推理引擎", "`Paddle Lite `_ ", "移动端、物联网等" + "飞桨轻量化推理引擎", "`Paddle Lite `_ ", "移动端、物联网等" "飞桨前端推理引擎", "Paddle.js", "浏览器中推理、小程序等" diff --git a/docs/guides/paddle_v3_features/auto_parallel_cn.md b/docs/guides/paddle_v3_features/auto_parallel_cn.md index 4e9fcc3d60c..b85e8770d0a 100644 --- a/docs/guides/paddle_v3_features/auto_parallel_cn.md +++ b/docs/guides/paddle_v3_features/auto_parallel_cn.md @@ -751,8 +751,8 @@ opt = paddle.optimizer.AdamW(learning_rate=0.001, parameters=model.parameters()) opt = dist.shard_optimizer(opt) # 在模型训练阶段开始前加载 -dist.save_state_dict(model.state_dict(), './ckpt/model') -dist.save_state_dict(opt.state_dict(), './ckpt/opt') +dist.load_state_dict(model.state_dict(), './ckpt/model') +dist.load_state_dict(opt.state_dict(), './ckpt/opt') for step, inputs in enumerate(dataloader): data = inputs diff --git a/docs/guides/paddle_v3_features/cinn_cn.md b/docs/guides/paddle_v3_features/cinn_cn.md index 1e0eb35f5a4..da24066193c 100644 --- a/docs/guides/paddle_v3_features/cinn_cn.md +++ b/docs/guides/paddle_v3_features/cinn_cn.md @@ -73,16 +73,10 @@ print(out) # 打开组合算子 export FLAGS_prim_enable_dynamic=true && export FLAGS_prim_all=true -# 打开 CINN 编译器相关 FLAG +# 打开 CINN 编译器 export FLAGS_use_cinn=true -export FLAGS_cinn_new_group_scheduler=true -export FLAGS_group_schedule_tiling_first=true -export FLAGS_cinn_bucket_compile=true -# 打开 PIR 模式 -export FLAGS_enable_pir_api=true - -# 是否打印 Program IR 信息 +# 是否打印 Program IR 信息 (用于调试) export FLAGS_print_ir=false python run_net.py @@ -90,7 +84,7 @@ python run_net.py 上述代码示例中我们创建了一个简单的`rms_norm`计算子图,使用飞桨的动转静流程将子图转为静态图并调用编译器 CINN 进行优化和执行。经过性能对比测试,在 A100 GPU 环境中上述子图使用 CINN 可以取得 3 倍左右的性能提升(该性能数据仅供学习参考,在实际应用模型中能够取得的性能提升效果一般会低于该数据)。 -注:由于飞桨的编译器仍然处在快速迭代开发阶段,我们设置了较多 FLAGS 进行分支的选择和调试,因此现阶段在使用 CINN 时需要对如下 FLAGS(`FLAGS_prim_enable_dynamic`、 `FLAGS_cinn_new_group_scheduler`、 `FLAGS_group_schedule_tiling_first`、 `FLAGS_cinn_bucket_compile`、 `FLAGS_enable_pir_api`) 进行手动设置,待后续相关功能完备后这些 FLAGS 会默认开启,无需再手动设置。 +注:由于飞桨的编译器仍然处在快速迭代开发阶段,我们设置了多个 FLAGS 进行分支的选择和调试,因此现阶段在使用 CINN 时需要对如下 FLAGS( `FLAGS_use_cinn`、`FLAGS_prim_enable_dynamic`、`FLAGS_prim_all`) 进行手动设置,待后续相关功能完备后动转静流程将默认开启这些 FLAGS,无需再手动设置。 ## 四、设计架构
+
图 3 Modulus 系列模型性能对比数据

+ +2. PaddleX 模型: + * PaddleX 系列 60+ 模型使用 CINN 编译器后超 60% 模型有显著性能提升,平均提升达 27.4%。部分重点模型相比 Pytorch 也有明显性能优势。 +
+
图 4 部分重点模型单机 8 卡 FP16 训练性能对比数据

diff --git a/docs/guides/paddle_v3_features/higher_order_ad_cn.md b/docs/guides/paddle_v3_features/higher_order_ad_cn.md index 076c1232b29..f29a533d4a3 100644 --- a/docs/guides/paddle_v3_features/higher_order_ad_cn.md +++ b/docs/guides/paddle_v3_features/higher_order_ad_cn.md @@ -16,11 +16,11 @@ ## 二、设计思想 -高阶自动微分的实现面临诸多挑战。具体而言,框架需要为每个算子编写高阶微分规则。随着阶数的增加,微分规则的复杂性也随之上升。当阶数达到三阶或更高时,编写这些规则变得极其困难,同时正确性难以保证。为了解决这一问题,飞桨提出了基于基础算子组合的高阶自动微分技术。该技术的关键在于将复杂算子(如 log_softmax)拆解为多个基础算子的组合。然后,我们对这些基础算子进行一阶自动微分变换。重要的是,基础算子经过一阶自动微分变换后,其得到的计算图仍然是由基础算子所构成。通过反复应用一阶自动微分规则,我们可以轻松地获得高阶自动微分结果。 +高阶自动微分的实现面临诸多挑战。具体而言,框架需要为每个算子编写高阶微分规则。随着阶数的增加,微分规则的复杂性也随之上升。当阶数达到三阶或更高时,编写这些规则变得极其困难,同时正确性难以保证。为了解决这一问题,飞桨提出了基于基础算子组合的高阶自动微分技术。该技术的关键在于将复杂算子(如 `log_softmax`)拆解为多个基础算子的组合。然后,我们对这些基础算子进行一阶自动微分变换。重要的是,基础算子经过一阶自动微分变换后,其得到的计算图仍然是由基础算子所构成。通过反复应用一阶自动微分规则,我们可以轻松地获得高阶自动微分结果。 **log_softmax 拆解与微分示例** -根据 log_softmax 表达式拆解为 exp、max、log 等细粒度基础算子组成,基础算子是指由简单运算逻辑组成的有限集合,数量较少。基于飞桨的自动微分体系,使用基础算子的微分规则自动推导 log_softmax 一阶微分,注意基础算子微分规则仍由基础算子实现,因此 log_softmax 的一阶微分仍由基础算子组成。重复上述微分过程实现 log_softmax 高阶微分求解。 +`log_softmax` 计算过程,可拆解为 `exp`、`max`、`log` 等细粒度的基础算子(基础算子是指由简单、不可再拆分运算逻辑组成的有限集合,数量较少)。而后基于飞桨的自动微分体系,即可使用使用基础算子的微分规则自动推导出 `log_softmax` 的一阶反向微分。又由于基础算子微分规则仍由基础算子实现,因此 `log_softmax` 的二阶以及更高阶的微分仍然由基础算子组成。综上所述,通过基础算子微分规则的组合,即可实现复杂算子的高阶微分。
@@ -28,172 +28,353 @@ ## 三、框架架构 -为了支持高阶自动微分,飞桨框架精心设计与实现了组合算子机制。这一机制不仅兼容动态图模式和静态图模式,而且在动态图模式下支持 N+1 阶微分的拆分,同时在静态图模式下能够进行编译器融合优化。创新性地设计并实现了动静一体的算子组合规则,这意味着同一套组合规则在动态图和静态图两种模式下均可复用,从而避免了重复开发。在构建基础算子体系时,我们以 Tensor 作为核心操作对象,确保了算子的原子性、实用性和完备性。此外,我们还支持自定义反向操作和自动重计算功能,这些特性不仅提升了模型的精度,还有效地减少了显存占用,为用户提供了更高效、更灵活的深度学习体验。 +为了支持高阶自动微分,飞桨框架精心设计与实现了组合算子机制。这一机制不仅兼容动态图模式和静态图模式,而且在动态图模式下支持 N+1 阶微分的拆分(即第 N 阶微分使用手写算子,第 N+1 阶微分使用组合算子逻辑,最大限度保证存量算子和模型的精度、性能),同时在静态图模式下能够进行编译器融合优化。创新性地设计并实现了动静一体的算子组合规则,这意味着同一套组合规则在动态图和静态图两种模式下均可复用,从而避免了重复开发。在构建基础算子体系时,我们以 Tensor 作为核心操作对象,确保了算子的原子性、实用性和完备性。此外,我们还支持自定义反向操作和自动重计算功能,这些特性不仅提升了模型的精度,还有效地减少了显存占用,为用户提供了更高效、更灵活的深度学习体验。
-**基础算子集合设计** +- 基础算子集合设计 基础算子集合的设计需要兼顾通用性、计算效率、易用性和兼容性,此外,还需要具备可扩展性,以便可以方便地添加新的数据处理操作和模型,并可以组合支撑更加复杂的计算工作。飞桨制定了基础算子集合设计原则,1)原子性,即基础算子的操作不能拆分为更基础的操作,如不能把大于等于拆分为不小于;2)实用性,基础算子有实际应用场景;3)面向张量,基础算子的操作粒度为张量,如果一个算子需要在张量的元素粒度上进行复杂操作,则这个算子本身应为基础算子;4)完备性,可以支持复杂算子拆分需求。基于上述原则设计和实现基础算子集合,最终预期基础算子规模约控制到 200 左右,当前还在持续演进中。 -**动静一体组合规则** +- 动静一体组合规则 组合规则是指使用基础算子接口组合实现的复杂算子集合,为了能够在动态图和静态图体系下复用同一套组合规则,减少编码工作量,在基础算子层,设计一套抽象接口,屏蔽动态图基础算子和静态图基础算子实现细节,组合规则的实现调用抽象接口实现,并设计一套分发机制,根据动态图和静态图数据类型的不同进行分发到具体基础算子执行,从而实现动态图和静态图不同模式下组合规则的复用。 -**从机制上保障性能** +- 从机制上保障性能 随着算子细粒度拆分,算子数量会急剧膨胀,算子调度开销也会加大。动态图模式算子动态执行,无法提前优化,为了减少算子拆分造成的动态图性能损耗,飞桨采取了拆解 N+1 阶算子方法。即如果现有算子已经实现了 N 阶反向大算子,为了保证现有模型性能不降低,会实现第 N+1 阶算子的拆解逻辑。在调度上优先运行 1~N 阶大算子,在第 N+1 阶才会拆解成基础算子,保证性能同时支持高阶微分。静态图模式下,由于可以提前整图优化,基于飞桨编译器技术进行图层调度优化和算子融合优化,并且由于算子粒度更细,存在优化空间更大,部分模型上基于组合算子体系和编译器优化的模型性能已经超越了原有大算子体系下模型性能。 -**从机制上保障显存和精度** +- 从机制上保障显存和精度 模型执行过程通常是先执行前向计算,并保存反向计算依赖的中间变量,反向计算复用这些中间变量进行计算。算子细粒度拆分,使需要保存的中间变量急剧增大,模型运行需要的显存大幅增加。飞桨使用自定义反向技术解决该问题,对于一个复杂大算子,支持自定义其反向微分规则,该微分规则实现只依赖其前向大算子的输入输出,并在框架调度上优先保障走该算子的自定义反向微分,而非自动推导的微分规则,从而减少中间变量,降低显存。 -## 四、开始使用 +## 四、二维平板分布受载问题 -飞桨提供了完善高阶自动微分求解 API,包括通用反向微分求解 paddle.grad,多元函数雅可比矩阵计算 `paddle.autograd.jacobian` ,多元函数海森矩阵计算 `paddle.autograd.hessian`. 功能与链接具体参考 4.1. +### 4.1 问题描述 -下面通过一个简单示例演示飞桨高阶自动微分用法。 +基于上述飞桨的高阶自动微分能力,接下来尝试解决在第一章提到的“2D 矩形平板分布受载问题”。首先列出该问题的数学模型: -**第一步:导入依赖** +薄板小挠度理论的基本方程为: +$$ +\frac{\partial^4 w}{\partial x^4}+2 \frac{\partial^4 w}{\partial x^2 \partial y^2}+\frac{\partial^4 w}{\partial y^4}=\frac{q}{D} +$$ -```python -import paddle -``` - -**第二步:编写组网代码** +其中 $w(x,y)$ 表示薄板的挠度,即薄板在垂直载荷作用下的变形或偏移量,$x,y$ 表示薄板在平面内的坐标,$D$ 为薄板的弯曲刚度,$q$ 是作用在薄板上的面载荷,表示每单位面积上的外部载荷。 -以单层的全联接网络为例,MyNet 继承自 paddle.nn.Layer,在__init__方法中初始化网络参数,在 forward 方法中实现前向运行逻辑。注意,当前高阶自动微分支持大部分飞桨常用 API,覆盖主流的科学计算模型,如果您在写新的模型遇到飞桨高阶微分问题,可通过飞桨 ISSUE 反馈。 - -```python -class MyNet(paddle.nn.Layer): - def __init__(self): - super().__init__() - self.linear = paddle.nn.Linear(2, 2) - - def forward(self, x): - y = self.linear(x) - return paddle.tanh(y) -``` +在本问题中,矩形薄板 $x$ 方向长 $2m$,$y$ 方向宽 $1m$,板厚 $10mm$,$x$ 方向左右两边处于简支状态(可以转动但不能位移),$y$ 方向上下两边自由(没有任何约束,可以自由移动和转动)。 -**第三步:创建网络及声明输入数据,执行前向计算过程** +左右两边 $(x=-1 \mid x=+1)$ 为简支边界条件,因此挠度 $w$ 和弯矩 $M_x$ 都为 $0$ : -```python -x = paddle.randn([2, 2]) -x.stop_gradient = False # 允许计算对 x 的梯度 -net = MyNet() -y = net(x) -``` +$$ +(w)\_{x=-1 \mid x=+1}=0, \quad\left(M\_x\right)\_{x=-1 \mid x=+1}=0 +$$ -**第四步:计算 Loss** +由于 $M_x=-D\left(\frac{\partial^2 w}{\partial x^2}+\mu \frac{\partial^2 w}{\partial y^2}\right)$, 且 $\frac{\partial^2 w}{\partial y^2}=0$, 所以简支边界条件可化简为: -为了演示高阶微分用法,此处 Loss 定义中使用了 `paddle.grad` API 计算 `y` 对 `x` 二阶微分,并使用 `L2 norm` 归一化。 +$$ +(w)\_{x=-1 \mid x=+1}=0, \quad\left(\frac{\partial^2 w}{\partial x^2}\right)\_{x=-1 \mid x=+1}=0 +$$ -```python -grad_x, = paddle.grad(y, x, create_graph=True) # create_graph=True,创建下一阶计算图 -grad_grad_x, = paddle.grad(grad_x, x, create_graph=True) # create_graph=True,创建下一阶计算图 -loss = paddle.norm(grad_grad_x, p=2) -``` +上下两边 $(y=-0.5 \mid y=+0.5)$ 为自由边界条件, 弯矩、扭矩、横向剪切力都为 $0$ : -**第五步:执行反向计算过程,使用用 Adam 优化器更新参数** +$$ +\left(M\_y\right)\_{\mathrm{y}=-0.5 \mid \mathrm{y}=+0.5}=0, \quad\left(M\_{x y}\right)\_{\mathrm{y}=-0.5 \mid \mathrm{y}=+0.5}=0, \quad\left(Q\_y\right)\_{\mathrm{y}=-0.5 \mid \mathrm{y}=+0.5}=0 +$$ -```python -optim = paddle.optimizer.Adam(parameters=net.parameters()) -loss.backward() -optim.step() -``` +由于 $M_y=-D\left(\frac{\partial^2 w}{\partial y^2}+\mu \frac{\partial^2 w}{\partial x^2}\right), \quad M_{x y}=-D(1-\mu) \frac{\partial^2 w}{\partial x \partial y}, \quad Q_y=-D \frac{\partial}{\partial y}\left(\frac{\partial^2 w}{\partial x^2}+\frac{\partial^2 w}{\partial y^2}\right)$ ,且扭矩可以变换为等效剪力, 扭矩和横向剪力合并为 $\left(Q_y+\frac{\partial M_{x y}}{\partial x}\right)_{\mathrm{y}=-0.5 \mid \mathrm{y}=+0.5}=0$, 所以自由边界条件用挠度表示为 -### 4.1 自动微分相关 API +$$ +\left(\frac{\partial^2 w}{\partial y^2}+\mu \frac{\partial^2 w}{\partial x^2}\right)\_{y=-0.5 \mid y=+0.5}=0, \quad\left(\frac{\partial^3 w}{\partial y^3}+(2-\mu) \frac{\partial^3 w}{\partial x^2 \partial y}\right)\_{y=-0.5 \mid y=+0.5}=0 +$$ -| API 名称 | API 功能 | -|:-------------------------------------------------------------------------------------------------------------------------------------|:---------------| -| [paddle.grad](https://www.paddlepaddle.org.cn/documentation/docs/zh/develop/api/paddle/grad_cn.html#grad) | 反向模式自动微分 | -| [paddle.auto.jacobian](https://www.paddlepaddle.org.cn/documentation/docs/zh/develop/api/paddle/autograd/jacobian_cn.html#jacobian) | 雅可比矩阵计算 | -| [paddle.autograd.hessian](https://www.paddlepaddle.org.cn/documentation/docs/zh/develop/api/paddle/autograd/hessian_cn.html#hessian) | 海森矩阵计算 | +### 4.2 使用飞桨原生 API 求解 -**使用反向微分 API paddle.grad 计算 tanh 高阶导数** +接下来给出上述问题转换成的飞桨代码。 ```python import paddle - -# 组网代码 -x = paddle.rand((2,)) -y = paddle.tanh(x) -grad1 = paddle.grad(y, x, create_graph=True) # 一阶微分 -grad2 = paddle.grad(grad1, x, create_graph=True) # 二阶微分 -grad3 = paddle.grad(grad2, x) # 三阶微分 - -print(grad1, grad2, grad3) -# [0.41997433] [-0.6397] [0.6216267] +import numpy as np +from paddle import grad +from matplotlib import pyplot as plt + +# 设置薄板计算域长、宽参数 +Lx = 2.0 # 薄板 x 方向长度(m) +Ly = 1.0 # 薄板 y 方向宽度(m) + +# 设置方程参数 +E = 210000.0e6 # 弹性模量(Pa) +mu = 0.28 # 薄板泊松比(无量纲) +h = 0.01 # 薄板厚度(m) +D = E * (h**3) / (12 * (1 - mu**2)) # 薄板弯曲刚度(kN*m^2) +q = 1000.0 # 均布载荷(N/m^2) + +in_channels = 2 # 输入为 x, y +out_channels = 1 # 输出为 w + +model = paddle.nn.Sequential( + paddle.nn.Linear(in_features=in_channels, out_features=32), + paddle.nn.Silu(), + paddle.nn.Linear(in_features=32, out_features=64), + paddle.nn.Silu(), + paddle.nn.Linear(in_features=64, out_features=32), + paddle.nn.Silu(), + paddle.nn.Linear(in_features=32, out_features=out_channels), +) # 建立一个简单的多层全连接层模型,包含 4 个线性层和 3 个激活函数。 + +opt = paddle.optimizer.Adam(5e-4, parameters=model.parameters()) + + +def grad_with_order(y, x, k): + # Compute the gradient of y with respect to x at order k. + g = y + for _ in range(k): + g = grad(g, x, create_graph=True)[0] + + return g + + +def mse(pred, label): + # Compute the mean squared error between the predicted and true values. + return ((pred - label) ** 2).mean() + + +for i in range(1000): + # 1. define pde loss + np_rand_xy = np.random.uniform( + [-Lx / 2, -Ly / 2], [Lx / 2, Ly / 2], size=(1000, 2) + ).astype("float32") + x = paddle.to_tensor(np_rand_xy[:, 0:1], stop_gradient=False) # [1000, 1] + y = paddle.to_tensor(np_rand_xy[:, 1:2], stop_gradient=False) # [1000, 1] + tensor_input = paddle.concat([x, y], axis=1) # [1000, 2] + output = model(tensor_input) + w = output + w_x_x = grad_with_order(w, x, 2) + w_y_y = grad_with_order(w, y, 2) + w_x_x_x_x = grad_with_order(w_x_x, x, 2) + w_y_y_y_y = grad_with_order(w_y_y, y, 2) + w_x_x_y_y = grad_with_order(w_x_x, y, 2) + + loss_pde = mse(w_x_x_x_x + 2 * w_x_x_y_y + w_y_y_y_y, q / D) + + # 2. define bc_left_right_loss + np_rand_x = np.random.choice([-Lx / 2, Lx / 2], size=(50, 1)).astype("float32") + np_rand_y = np.random.uniform(-Ly / 2, Ly / 2, size=(50, 1)).astype("float32") + x = paddle.to_tensor(np_rand_x, stop_gradient=False) # [50, 1] + y = paddle.to_tensor(np_rand_y, stop_gradient=False) # [50, 1] + tensor_input = paddle.concat([x, y], axis=1) # [50, 2] + output = model(tensor_input) + w = output + w_x_x = grad_with_order(w, x, 2) + + loss_bc_lr = mse(w, 0) + mse(w_x_x, 0) + + # 3. define bc_loss + np_rand_x = np.random.uniform(-Lx / 2, Lx / 2, size=(50, 1)).astype("float32") + np_rand_y = np.random.choice([-Ly / 2, Ly / 2], size=(50, 1)).astype("float32") + x = paddle.to_tensor(np_rand_x, stop_gradient=False) # [50, 1] + y = paddle.to_tensor(np_rand_y, stop_gradient=False) # [50, 1] + tensor_input = paddle.concat([x, y], axis=1) # [50, 2] + output = model(tensor_input) + w = output + w_x_x = grad_with_order(w, x, 2) + w_y_y = grad_with_order(w, y, 2) + w_y_y_y = grad_with_order(w_y_y, y, 1) + w_x_x_y = grad_with_order(w_x_x, y, 1) + loss_bc_ud = mse(w_y_y + mu * w_x_x, 0) + mse(w_y_y_y + (2 - mu) * w_x_x_y, 0) + + # loss backward and update parameters + loss = loss_pde + loss_bc_lr + loss_bc_ud + opt.clear_grad() + loss.backward() + opt.step() + if i % 10 == 0: + print(f"Loss at iter {i}: {loss.item():.3e}") + + +# plot result +num_cord0 = 101 +num_cord1 = 101 +num_cords = num_cord0 * num_cord1 +print(f"num_cords = {num_cords}") +x, y = np.meshgrid( + np.linspace( + start=-Lx / 2, stop=Lx / 2, num=num_cord0, endpoint=True, dtype="float32" + ), + np.linspace( + start=-Ly / 2, stop=Ly / 2, num=num_cord1, endpoint=True, dtype="float32" + ), +) +x = x.ravel() +y = y.ravel() +# predict solution of w(x, y) on the 2D grid +w_pred = model(paddle.stack((paddle.to_tensor(x), paddle.to_tensor(y)), axis=1)) +w_pred = w_pred.numpy() +fig = plt.figure(100, figsize=(5, 4)) +y_min = w_pred.min(axis=(0,))[0] +y_max = w_pred.max(axis=(0,))[0] +ax1 = plt.subplot(1, 1, 1) +plt.tricontourf(x, y, w_pred[:, 0], levels=30, cmap="rainbow") +print(x.shape, y.shape, w_pred.shape) +cb1 = plt.colorbar() +plt.axis("equal") +plt.xlabel("$x (m)$") +plt.ylabel("$y (m)$") +plt.title(f"w-field: [{y_min:.6f}, {y_max:.6f}]", fontsize=9.5) +# plt.show() +plt.savefig("./result.jpg") +print("saved matplotlib to: ./result.jpg") ``` -**使用 paddle.autograd.jacobian 计算 Jacobian 矩阵** +### 4.3 使用 PaddleScience API 求解 + +基于飞桨框架,我们开发了科学计算套件 [**PaddleScience**](https://paddlescience-docs.readthedocs.io/zh-cn/latest/),并提供了更上层的 API: [`ppsci.lambdify`](https://paddlescience-docs.readthedocs.io/zh-cn/latest/zh/api/utils/symbolic/?h=#ppsci.utils.symbolic.lambdify)。`ppsci.lambdify` 可以自动将 sympy 表达式转换为基于 Paddle 原生 API 的计算函数,从而避免用户多次显式调用 `paddle.grad`,同时该 API 具备子表达式缓存机制,使得用户无需再关注中间变量的复用。 ```python import paddle - -x1 = paddle.randn([3]) -x2 = paddle.randn([3]) -x1.stop_gradient = False -x2.stop_gradient = False - -y = x1 + x2 - -J = paddle.autograd.jacobian(y, (x1, x2)) -J_y_x1 = J[0][:] # evaluate result of dy/dx1 -J_y_x2 = J[1][:] # evaluate result of dy/dx2 - -print(J_y_x1.shape) -# [3, 3] -print(J_y_x2.shape) -# [3, 3] +import numpy as np +from matplotlib import pyplot as plt +import ppsci +import sympy as sp + +# 设置薄板计算域长、宽参数 +Lx = 2.0 # 薄板 x 方向长度(m) +Ly = 1.0 # 薄板 y 方向宽度(m) + +# 设置方程参数 +E = 210000.0e6 # 弹性模量(Pa) +mu = 0.28 # 薄板泊松比(无量纲) +h = 0.01 # 薄板厚度(m) +D = E * (h**3) / (12 * (1 - mu**2)) # 薄板弯曲刚度(kN*m^2) +q = 1000.0 # 均布载荷(N/m^2) + +in_channels = 2 # 输入为 x, y +out_channels = 1 # 输出为 w + +model = ppsci.arch.MLP( + ["x", "y"], ["w"], num_layers=None, hidden_size=[32, 64, 32], activation="silu" +) + +opt = paddle.optimizer.Adam(5e-4, parameters=model.parameters()) + + +def mse(pred, label): + # Compute the mean squared error between the predicted and true values. + return ((pred - label) ** 2).mean() + + +# 使用 sympy 库计算符号公式 +x, y = sp.symbols("x y") # 定义符号变量 x, y +w = sp.Function("w")(x, y) # 定义函数 w(x,y) +left = ( + w.diff(x, 4) + 2 * w.diff(x, 2).diff(y, 2) + w.diff(y, 4) +) # 定义薄板弯曲的双调和方程的左侧部分 +bc_lr = w.diff(x, 2) +bc_ud1 = w.diff(y, 2) + mu * w.diff(x, 2) +bc_ud2 = w.diff(y, 3) + (2 - mu) * w.diff(x, 2).diff(y) + +# lambdify the sympy expression to a callable function +pde_func = ppsci.lambdify(left, model) +bc_lr_func = ppsci.lambdify(bc_lr, model) +bc_ud1_func, bc_ud2_func = ppsci.lambdify([bc_ud1, bc_ud2], model) + +for i in range(1000): + # 1. define pde loss + np_rand_xy = np.random.uniform( + [-Lx / 2, -Ly / 2], [Lx / 2, Ly / 2], size=(1000, 2) + ).astype("float32") + x = paddle.to_tensor(np_rand_xy[:, 0:1], stop_gradient=False) # [1000, 1] + y = paddle.to_tensor(np_rand_xy[:, 1:2], stop_gradient=False) # [1000, 1] + data_dict = {"x": x, "y": y} + pde_out = pde_func(data_dict) + loss_pde = mse(pde_out, q / D) + + # 2. define bc_left_right_loss + np_rand_x = np.random.choice([-Lx / 2, Lx / 2], size=(50, 1)).astype("float32") + np_rand_y = np.random.uniform(-Ly / 2, Ly / 2, size=(50, 1)).astype("float32") + x = paddle.to_tensor(np_rand_x, stop_gradient=False) # [50, 1] + y = paddle.to_tensor(np_rand_y, stop_gradient=False) # [50, 1] + data_dict = {"x": x, "y": y} + w_x_x = bc_lr_func(data_dict) + loss_bc_lr = mse(data_dict["w"], 0) + mse(w_x_x, 0) + + # 3. define bc_loss + np_rand_x = np.random.uniform(-Lx / 2, Lx / 2, size=(50, 1)).astype("float32") + np_rand_y = np.random.choice([-Ly / 2, Ly / 2], size=(50, 1)).astype("float32") + x = paddle.to_tensor(np_rand_x, stop_gradient=False) # [50, 1] + y = paddle.to_tensor(np_rand_y, stop_gradient=False) # [50, 1] + data_dict = {"x": x, "y": y} + bc_ud1_out = bc_ud1_func(data_dict) + bc_ud2_out = bc_ud2_func(data_dict) + loss_bc_ud = mse(bc_ud1_out, 0) + mse(bc_ud2_out, 0) + + # loss backward and update parameters + loss = loss_pde + loss_bc_lr + loss_bc_ud + opt.clear_grad() + loss.backward() + opt.step() + if i % 10 == 0: + print(f"Loss at iter {i}: {loss.item():.3e}") + +# plot result +num_cord0 = 101 +num_cord1 = 101 +num_cords = num_cord0 * num_cord1 +print(f"num_cords = {num_cords}") +x, y = np.meshgrid( + np.linspace( + start=-Lx / 2, stop=Lx / 2, num=num_cord0, endpoint=True, dtype="float32" + ), + np.linspace( + start=-Ly / 2, stop=Ly / 2, num=num_cord1, endpoint=True, dtype="float32" + ), +) +x = x.ravel() +y = y.ravel() +# predict solution of w(x, y) on the 2D grid +w_pred = model.forward_tensor( + paddle.stack((paddle.to_tensor(x), paddle.to_tensor(y)), axis=1) +) +w_pred = w_pred.numpy() +fig = plt.figure(100, figsize=(5, 4)) +y_min = w_pred.min(axis=(0,))[0] +y_max = w_pred.max(axis=(0,))[0] +ax1 = plt.subplot(1, 1, 1) +plt.tricontourf(x, y, w_pred[:, 0], levels=30, cmap="rainbow") +print(x.shape, y.shape, w_pred.shape) +cb1 = plt.colorbar() +plt.axis("equal") +plt.xlabel("$x (m)$") +plt.ylabel("$y (m)$") +plt.title(f"w-field: [{y_min:.6f}, {y_max:.6f}]", fontsize=9.5) +# plt.show() +plt.savefig("./result.jpg") +print("saved matplotlib to: ./result.jpg") ``` -**使用 paddle.autograd.hessian 计算 Hessian 矩阵** +## 五、飞桨支撑科学计算 AI4S + +基于飞桨框架 3.0 为科学计算提供了高阶自动微分、编译优化、分布式训练能力支撑,提供了面向通用数理问题求解的赛桨 [**PaddleScience**](https://paddlescience-docs.readthedocs.io/zh-cn/latest/) 以及专注于生物计算的螺旋桨 [**PaddleHelix**](https://paddlehelix.baidu.com/) 工具组件。为了更好地支撑 AI for Science 生态,飞桨对国内外主流开源科学计算工具进行了适配,并被国际主流的科学计算深度学习库 DeepXDE 唯一推荐。 -```python -import paddle +### 5.1 飞桨 + Modulus-sym -x1 = paddle.randn([3, ]) -x2 = paddle.randn([4, ]) -x1.stop_gradient = False -x2.stop_gradient = False - -y = x1.sum() + x2.sum() - -H = paddle.autograd.hessian(y, (x1, x2)) -H_y_x1_x1 = H[0][0][:] # evaluate result of ddy/dx1x1 -H_y_x1_x2 = H[0][1][:] # evaluate result of ddy/dx1x2 -H_y_x2_x1 = H[1][0][:] # evaluate result of ddy/dx2x1 -H_y_x2_x2 = H[1][1][:] # evaluate result of ddy/dx2x2 - -print(H_y_x1_x1.shape) -# [3, 3] -print(H_y_x1_x2.shape) -# [3, 4] -print(H_y_x2_x1.shape) -# [4, 3] -print(H_y_x2_x2.shape) -# [4, 4] -``` +飞桨利用高阶自动微分与编译优化技术,在与 NVIDIA 合作适配其 AI Physics 工具 Modulus-sym 的过程中,成功完成了全量模型适配([**Modulus-sym(paddle-backend)**](https://github.com/PaddlePaddle/modulus-sym/tree/paddle?tab=readme-ov-file#modulus-symbolic-betapaddle-backend)),实现了方程求解类模型性能的大幅优化,相比 Modulus-sym 现有后端**求解速度平均提升 115%**; -## 五、飞桨支撑科学计算 AI4S +![ai4s.png](https://raw.githubusercontent.com/PaddlePaddle/docs/develop/docs/guides/paddle_v3_features/images/higher_order_ad/ai4s.png) -基于飞桨框架 3.0 为科学计算提供了高阶自动微分、编译优化、分布式训练能力支撑,提供了面向通用数理问题求解的赛桨 PaddleScience 以及专注于生物计算的螺旋桨 PaddleHelix 工具组件。为了更好地支撑 AI for Science 生态,飞桨对国内外主流开源科学计算工具进行了适配,并被国际主流的科学计算深度学习库 DeepXDE 唯一推荐。 +> 上述测试环境为:cuda 11.8, A100-SXM4-40GB, torch 2.6(2236df1), paddle 3.0(388165), ips = total_batch_size / batch_cost(ms) -此外,飞桨利用高阶自动微分与编译优化技术,在与 NVIDIA 合作适配其 AI Physics 工具 Modulus-sym 的过程中,成功完成了全量模型适配([Modulus-sym[paddle-backend]](https://github.com/PaddlePaddle/modulus-sym/tree/paddle?tab=readme-ov-file#modulus-symbolic-betapaddle-backend)),实现了方程求解类模型性能的大幅优化,相比 Modulus-sym 现有后端**求解速度平均提升 71%**; +### 5.2 飞桨 + DeePMD-kit -在 AI 分子动力学套件 [DeePMD-kit](https://docs.deepmodeling.com/projects/deepmd/en/latest/train/training.html) 中,我们对 dpa2, se_atten, se_e2_a 进行了动态图和编译器适配,相比 DeePMD-kit torch 后端,**求解速度分别提升了 102.6%, 40.5%, 102.6%**,相关结果已公开至论文:[DeePMD-kit v3: A Multiple-Backend Framework for Machine Learning Potentials](https://arxiv.org/abs/2502.19161)。 +在 AI 分子动力学套件 [**DeePMD-kit**](https://docs.deepmodeling.com/projects/deepmd/en/latest/train/training.html) 中,我们对 dpa2, se_atten, se_e2_a 进行了动态图和编译器适配,相比 DeePMD-kit torch 后端,**求解速度分别提升了 102.6%, 40.5%, 102.6%**,相关结果已公开至论文:[DeePMD-kit v3: A Multiple-Backend Framework for Machine Learning Potentials](https://arxiv.org/abs/2502.19161)。 -
- -
+基于飞桨后端运行 DeePMD-kit 可参考:[5.1. Train a model](https://docs.deepmodeling.com/projects/deepmd/en/latest/train/training.html) -| 模型名称/平均耗时(s/batch) | Torch(dygraph) | Paddle(dygraph) | Paddle(CINN) | IPS 提升率 | -|:---------------------------|:-------|:-------|:-------------|:-----------| -| dpa2 | 0.1064 | 0.120 | **0.053** | 102.6% | -| se_atten | 0.0336 | 0.049 | **0.024** | 40.5% | -| se_e2_a | 0.0227 | 0.025 | **0.011** | 102.6% | +| 模型名称/平均耗时(s/batch) | Torch(dygraph) | Paddle(dygraph) | Paddle(CINN) | IPS 提升率 | +|:---------------------------|:---------------|:----------------|:-------------|:-----------| +| dpa2 | 0.1064 | 0.120 | **0.053** | 102.6% | +| se_atten | 0.0336 | 0.049 | **0.024** | 40.5% | +| se_e2_a | 0.0227 | 0.025 | **0.011** | 102.6% | -> cuda 11.8, A100-SXM4-40GB, torch 2.6(2236df1), paddle 3.0(86994e3), ips 提升率 = (torch/paddle-1) +> 上述测试环境为:cuda 11.8, A100-SXM4-40GB, torch 2.6(2236df1), paddle 3.0(86994e3), ips 提升率 = (torch/paddle-1) diff --git a/docs/guides/paddle_v3_features/images/cinn/PaddleX_cinn_vs_torch.png b/docs/guides/paddle_v3_features/images/cinn/PaddleX_cinn_vs_torch.png new file mode 100644 index 00000000000..256e8d5e86b Binary files /dev/null and b/docs/guides/paddle_v3_features/images/cinn/PaddleX_cinn_vs_torch.png differ diff --git a/docs/guides/paddle_v3_features/images/cinn/modulus_cinn_vs_torch.png b/docs/guides/paddle_v3_features/images/cinn/modulus_cinn_vs_torch.png new file mode 100644 index 00000000000..6a831caaa94 Binary files /dev/null and b/docs/guides/paddle_v3_features/images/cinn/modulus_cinn_vs_torch.png differ diff --git a/docs/guides/paddle_v3_features/images/higher_order_ad/ai4s.png b/docs/guides/paddle_v3_features/images/higher_order_ad/ai4s.png index 4e1ae538802..109176da39b 100644 Binary files a/docs/guides/paddle_v3_features/images/higher_order_ad/ai4s.png and b/docs/guides/paddle_v3_features/images/higher_order_ad/ai4s.png differ diff --git a/docs/guides/paddle_v3_features/images/overview/paddle_v3_ai4s_overview.png b/docs/guides/paddle_v3_features/images/overview/paddle_v3_ai4s_overview.png new file mode 100644 index 00000000000..1f5945e0647 Binary files /dev/null and b/docs/guides/paddle_v3_features/images/overview/paddle_v3_ai4s_overview.png differ diff --git a/docs/guides/paddle_v3_features/images/overview/paddle_v3_autoparallel_speed.png b/docs/guides/paddle_v3_features/images/overview/paddle_v3_autoparallel_speed.png new file mode 100644 index 00000000000..7a45e4aaa81 Binary files /dev/null and b/docs/guides/paddle_v3_features/images/overview/paddle_v3_autoparallel_speed.png differ diff --git a/docs/guides/paddle_v3_features/images/overview/paddle_v3_autoparallel_workflow.png b/docs/guides/paddle_v3_features/images/overview/paddle_v3_autoparallel_workflow.png new file mode 100644 index 00000000000..2b077b03084 Binary files /dev/null and b/docs/guides/paddle_v3_features/images/overview/paddle_v3_autoparallel_workflow.png differ diff --git a/docs/guides/paddle_v3_features/images/overview/paddle_v3_highorder_autodiff.png b/docs/guides/paddle_v3_features/images/overview/paddle_v3_highorder_autodiff.png new file mode 100644 index 00000000000..4b18dbfbbae Binary files /dev/null and b/docs/guides/paddle_v3_features/images/overview/paddle_v3_highorder_autodiff.png differ diff --git a/docs/guides/paddle_v3_features/images/overview/paddle_v3_infer_deepseek.png b/docs/guides/paddle_v3_features/images/overview/paddle_v3_infer_deepseek.png new file mode 100644 index 00000000000..5eba20f9dc8 Binary files /dev/null and b/docs/guides/paddle_v3_features/images/overview/paddle_v3_infer_deepseek.png differ diff --git a/docs/guides/paddle_v3_features/images/overview/paddle_v3_train_infer.png b/docs/guides/paddle_v3_features/images/overview/paddle_v3_train_infer.png new file mode 100644 index 00000000000..cb742390e8e Binary files /dev/null and b/docs/guides/paddle_v3_features/images/overview/paddle_v3_train_infer.png differ diff --git a/docs/guides/paddle_v3_features/images/paddle-trt/model_original.png b/docs/guides/paddle_v3_features/images/paddle-trt/model_original.png new file mode 100644 index 00000000000..7510b66b81f Binary files /dev/null and b/docs/guides/paddle_v3_features/images/paddle-trt/model_original.png differ diff --git a/docs/guides/paddle_v3_features/images/paddle-trt/model_trt.png b/docs/guides/paddle_v3_features/images/paddle-trt/model_trt.png new file mode 100644 index 00000000000..9e6b0aaed3f Binary files /dev/null and b/docs/guides/paddle_v3_features/images/paddle-trt/model_trt.png differ diff --git a/docs/guides/paddle_v3_features/index_cn.rst b/docs/guides/paddle_v3_features/index_cn.rst index 7601e651b78..9645619f7a4 100644 --- a/docs/guides/paddle_v3_features/index_cn.rst +++ b/docs/guides/paddle_v3_features/index_cn.rst @@ -9,9 +9,17 @@ - `动静统一自动并行 `_ :介绍了飞桨动静统一的自动并行编程范式 +- `大模型训推一体 `_ :同一套框架支持训练和推理,实现训练、推理代码复用和无缝衔接 + +- `Paddle Inference(TensorRT 子图引擎) `_ :介绍了飞桨结合 TensorRT 子图引擎推理的使用方式和工作原理. + +- `科学计算高阶微分 `_ :介绍了飞桨高阶自动微分在科学计算领域的应用 + - `神经网络编译器 `_ :介绍了神经网络编译器自动优化的基本原理、架构和功能 -- `高阶自动微分 `_ :介绍了飞桨高阶自动微分在科学计算领域的应用 +- `异构多芯适配 `_ :构建多硬件统一适配方案,通过标准化接口屏蔽了不同芯片软件栈开发接口差异 + +**其他重要框架基础技术升级:** - `动转静 SOT 原理及使用 `_ :介绍了动转静 SOT 原理及使用方式 @@ -27,3 +35,4 @@ sot_cn.md paddle_ir_cn.md comate_paddle_cn.md + paddle_trt_cn.md diff --git a/docs/guides/paddle_v3_features/overview_cn.md b/docs/guides/paddle_v3_features/overview_cn.md index 9a7ca61f051..f9f6fc0e27c 100644 --- a/docs/guides/paddle_v3_features/overview_cn.md +++ b/docs/guides/paddle_v3_features/overview_cn.md @@ -1,87 +1,90 @@ -# 飞桨框架 3.0 新特性 +# 飞桨框架3.0正式版发布——加速大模型时代的技术创新与产业应用 + +作为中国首个自主研发的产业级深度学习平台,飞桨一直坚持开源路线,支撑产业智能化升级。2025年3月31日,飞桨框架迎来重大更新,发布飞桨框架3.0正式版。飞桨框架3.0版本不仅延续了飞桨框架2.0系列动静统一、训推一体的特性,更在自动并行、神经网络编译器、高阶自动微分等方面取得突破,为大模型时代的技术创新与产业应用提供了强大支撑,为开发者打造了一站式、高性能的深度学习开发体验。无论是前沿算法研究还是产业级大模型落地,飞桨框架3.0都将成为开发者的首选利器。 +飞桨框架3.0着重推出了以下五大新特性: +* 动静统一自动并行:通过少量的张量切分标记,即可自动完成分布式切分信息的推导,Llama预训练场景减少80%的分布式相关代码开发。 +* 大模型训推一体:依托高扩展性的中间表示(PIR)从模型压缩、推理计算、服务部署、多硬件推理全方位深度优化,支持文心4.5、文心X1等多款主流大模型,DeepSeek-R1满血版单机部署吞吐提升一倍。 +* 科学计算高阶微分:通过高阶自动微分和神经网络编译器技术,微分方程求解速度比PyTorch快115%。 +* 神经网络编译器:通过自动算子自动融合技术,无需手写CUDA等底层代码,部分算子执行速度提升4倍,模型端到端训练速度提升27.4%。 +* 异构多芯适配:通过对硬件接入模块进行抽象,降低异构芯片与框架适配的复杂度,兼容硬件差异,初次跑通所需适配接口数比PyTorch减少56%,代码量减少80%。 + +## 概述 +在大模型时代,深度学习框架的重要性愈发凸显,成为推动人工智能技术发展的核心引擎。算法、算力、数据作为人工智能技术的三大要素,其相互作用与协同发展不断催生着新的突破。越来越多的实例证明,算法创新能够发挥出更为显著的威力。DeepMind的AlphaFold3通过动态扩散算法突破蛋白质结构预测精度,已成功应用于抗疟疾等药物分子设计;DeepSeek通过算法创新,成功提升了DeepSeek V3模型的性价比,大幅降低了训练成本。这些突破性进展表明,算法创新正在重构技术发展的成本曲线。 +然而,算法创新并非易事,当前算法工程师和科研人员在使用现有深度学习框架进行算法创新时,仍面临诸多挑战。 +* 大模型分布式开发门槛高。大模型参数规模庞大,其分布式训练需使用复杂的并行策略,包括数据并行、张量并行、参数分片并行、流水线并行、序列并行、专家并行等。大模型开发中,如何实现多种并行策略的高效协同已成为关键瓶颈。 +* 模型推理部署困难重重。由于算法训练和推理任务的计算、通信存在较大差别,算法工程师在完成模型算法创新后,往往难以直接应用于推理部署,需要大量的工程开发工作。 +* 前沿模型架构灵活多变。科学智能(AI for Science)等新兴领域的快速发展,对深度学习框架提出了新的要求,包括求解复杂微分方程所需的高阶自动微分、傅里叶变换等科学计算操作、复数的高效运算等。 +* 模型极致性能优化难度大。以大模型为代表的很多场景对训练推理速度有严苛要求,为突破计算瓶颈,工程实践中常需通过手写CUDA内核代码进行性能优化,这对算法工程师的底层编程能力提出了极高要求。 +* 模型算力需求多样。AI应用场景丰富多样、算力需求巨大,单一芯片难以满足业务需求。而不同芯片之间的硬件架构、软件栈成熟度、开发接口差异大,业务适配成本高、软硬协同优化难。 +为此,飞桨新一代框架3.0应运而生:该版本提供了丰富的深度学习相关的各种开发接口;表示层专注于计算图的表达与转换,通过高可扩展中间表示 PIR,实现动转静、自动微分、自动并行、算子组合以及计算图优化等核心功能;调度层负责对代码或计算图进行智能编排与高效调度,支持动态图和静态图两种不同的执行模式;算子层由神经网络编译器 CINN 和算子库 PHI 共同构成,涵盖了张量定义、算子定义、算子自动融合和算子内核实现等关键功能;适配层则用于实现与底层芯片适配,包括设备管理、算子适配、通信适配以及编译接入等功能。 + +为此,飞桨新一代框架3.0应运而生:该版本提供了丰富的深度学习相关的各种开发接口;表示层专注于计算图的表达与转换,通过高可扩展中间表示 PIR,实现动转静、自动微分、自动并行、算子组合以及计算图优化等核心功能;调度层负责对代码或计算图进行智能编排与高效调度,支持动态图和静态图两种不同的执行模式;算子层由神经网络编译器 CINN 和算子库 PHI 共同构成,涵盖了张量定义、算子定义、算子自动融合和算子内核实现等关键功能;适配层则用于实现与底层芯片适配,包括设备管理、算子适配、通信适配以及编译接入等功能。 -## 一、概述 - -深度学习框架作为基础软件,不仅促进了深度学习技术的飞速进步,更为人工智能技术的广泛应用铺设了坚实的基础。首先深度学习框架为开发者提供了便捷易用的开发接口,这些接口对数据和操作进行了高度抽象,使得开发者能够更专注于算法和模型的设计,而不必深陷底层数据的处理细节。通过这些接口,开发者无需直接感知和应对复杂的硬件底层开发细节,从而极大地提升了开发效率和体验。其次深度学习框架还提供了自动微分这一强大功能,开发者通常只需要编写前向传播网络的代码,而繁琐的反向传播网络则交由框架自动完成。 - -飞桨框架是我国首个自主研发、开源开放且功能丰富的深度学习框架,自 2016 年起正式对外开源。2018 年,我们发布了飞桨框架 1.0 版本,该版本默认使用静态图,并着手研发动态图功能。2021 年初,飞桨框架 2.0 版本问世,它默认采用动态图,并实现了动静统一与训推一体的设计。此版本进一步融合了动态图的灵活性与静态图的高效性,同时支持了千亿参数模型的混合并行训练。在此期间,飞桨还踏上了神经网络编译器技术的探索征程。随着大模型时代的到来,模型参数规模日益扩大,训练成本也随之上升,这对深度学习框架在大规模分布式训练和性能优化方面提出了更高要求。为此,我们推出了飞桨框架 3.0 正式版,标志着飞桨新一代框架技术创新之路的开启。该版本的核心特性包括动静统一自动并行技术和神经网络编译器自动优化等新技术,旨在应对当前深度学习领域的新挑战。飞桨框架 3.x 版本延续了 2.x 版本动静统一、训推一体的设计理念,其开发接口全面兼容 2.x 版本。这意味着,使用 2.x 版本开发的代码,在绝大多数情况下无需修改,即可直接在 3.x 版本上运行。 - -以下是飞桨框架 3.x 的新特性: - -- **动静统一自动并行:** 飞桨推出了动静统一的自动并行编程范式,显著降低了编写分布式训练程序的复杂度。开发者无需深入研究并手动编写复杂的并行切分和通信代码,只需进行少量的张量切分标注,即可完成分布式模型的构建。框架能够为用户自动推导分布式切分状态并添加通信操作,同时还支持一键动转静训练,大幅简化了分布式训练代码的开发过程。 -- **神经网络编译器自动优化:** 飞桨神经网络编译器 CINN(Compiler Infrastructure for Neural Networks)采用与框架一体化的设计,能够支持生成式模型、科学计算模型等多种模型的高效训练与可变形状推理,为计算灵活性与高性能之间提供了一个良好的平衡点。通过算子的自动融合和代码生成技术,Llama2 和 Stable Diffusion 模型的性能提升了 30%。 -- **高阶自动微分:** 为了更好支持科学计算等场景,飞桨框架设计并实现了基于组合算子机制的高阶自动微分技术,结合神经网络编译器自动优化技术,我们测试了超过 40 多个科学计算场景的微分方程,其求解速度领先业界同类产品 70%。 -- **高扩展中间表示** :为了提升飞桨框架的可扩展性,我们研发了高扩展中间表示 PIR(Paddle Intermediate Representation)。这一表示系统性地抽象了底层核心概念,提供了灵活且高效的组件。PIR 作为基础设施,支撑着动转静、自动微分、自动并行、组合算子、图优化等多项技术,并广泛应用于分布式训练、模型压缩、推理部署等场景。通过 PIR 提供的 DRR(Declarative Rewrite Rule)机制,Pass 的开发成本可以降低 60%。我们对超过 900 个模型配置进行了测试,结果显示,在使用 PIR 后,推理的整体性能提升了超过 10%。 -- **多硬件适配:** 飞桨为大模型硬件适配提供了功能完善且低成本的方案。新硬件仅需适配 30 余个接口,即可支持大模型的训练、压缩与推理。同时,飞桨提供了基于编译器的硬件接入方式,硬件厂商只需以插件的形式实现编译器的代码生成后端,便能实现与飞桨框架的高效适配。 - -上述特性在飞桨框架 2.6 版本或更早版本时就已经开始开发,目前已达到外部可试用的阶段。由于这些新特性在使用体验、性能、二次开发便利度以及硬件适配能力等方面带来了显著提升,因此我们决定发布 3.0 正式版。此版本包含了对框架 2.x 版本部分已有功能的改进,并且在不使用新特性的情况下,表现是成熟稳定的。 - -## 二、设计思想 - -当前,AI 技术的发展正日新月异,引领着科技的前沿。深度学习框架的设计对于推动人工智能技术的发展至关重要,其核心设计目标是让深度学习技术的创新与应用更简单。那么如何做到这一点呢?我们需要从以下几个方面来考虑。 - -首先,框架向上对接开发者的需求。一个优秀的深度学习框架应当为开发者提供极致的开发体验。这不仅仅意味着提供一个用户友好的开发环境,更重要的是要能够大幅度减少开发者的学习成本和时间成本,同时显著提升开发的便利性。为此,飞桨框架提出了“动静统一、训推一体、自动并行”的理念,极大地提高了开发效率。 - -其次,框架向下对接硬件。现代深度学习应用往往需要在多样化的硬件平台上运行,因此,框架必须能够兼容并适配各种不同的硬件设备。这要求框架能够智能地隔离不同硬件接口之间的差异,实现广泛的硬件适配性。同时,为了充分发挥硬件的性能,框架还需要具备软硬件协同工作的能力,确保在利用硬件资源时能够达到最优的性能表现。 - -再者,框架需要考虑到 AI 技术发展的整体趋势。随着技术的不断进步,诸如 MOE(Mixture of Experts)、多模态以及科学智能(AI for Science)等前沿技术逐渐成为新的研究热点。深度学习框架应当能够紧跟这些技术发展的步伐,为研究者提供必要的支持和工具,以推动相关技术的快速发展和应用。在大模型领域,模型的显著特点是参数规模庞大、训练数据海量,以及对算力的巨大需求。随着模型复杂性的增加,计算瓶颈、存储瓶颈、访存瓶颈以及通信瓶颈等问题逐渐凸显。同时新的网络结构如 RWKV、Mamba 等也在不断涌现,为 AI 技术的发展注入了新的活力。为了解决这些问题,分布式训练和通用性能优化的需求日益迫切。在 AI for Science 领域,人工智能正引发科学发现和模式创新的深刻变革。以 AlphaFold 为代表的生物计算模型,GraphCast 等气象模型,物理信息神经网络(PINN)和傅里叶算子学习方法(FNO)都展示了 AI 在科学研究中的强大能力。为了支持科学计算模型,框架的设计需要能够支持高阶自动微分、复数运算、傅里叶变换等功能。 +
+ +
-最后,框架需要能够支持产业的实际落地应用。在产业化方面,框架需要具备支持训练、压缩、推理一体化的全流程能力。这意味着,从模型的训练到优化,再到实际部署和推理,框架应当提供一套完整、高效的解决方案,以满足产业界对于深度学习技术的实际需求。 +飞桨框架3.0凭借强大的功能和优化的设计,帮助算法工程师和科研人员以更低的成本进行算法创新,并实现产业应用。以百度文心大模型为例,飞桨框架3.0在训练、推理等方面为文心大模型提供端到端优化,训练方面重点提升训练吞吐、训练有效率和收敛效率,集群训练有效率超过98%;推理部署方面通过注意力机制量化推理、通用投机解码等技术提升推理吞吐和效率;全面支持文心4.5、文心X1等大模型的技术创新和产业应用。 -总的来说,飞桨将为开发者提供一个“动静统一、训推一体、自动并行、自动优化、广泛硬件适配”的深度学习框架,开发者可以像写单机代码一样写分布式代码,无需感知复杂的通信和调度逻辑,即可实现大模型的开发;可以像写数学公式一样用 Python 语言写神经网络,无需使用硬件开发语言编写复杂的算子内核代码,即可实现高效运行。 +## 一、全面支持自动并行训练,降低大模型开发训练门槛 -## 三、框架架构 +在大模型时代,随着模型规模和训练数据量的不断增长,传统的单机单卡训练已无法满足需求,分布式并行训练成为加速大模型迭代的关键。然而,无论是动态图还是静态图,当前市场上的并行训练框架普遍存在使用成本高的问题。开发者既要熟知模型结构,还要深入了解并行策略和框架调度逻辑, 使得大模型的开发和性能优化门槛非常高。这些问题使得大模型训练成为少数玩家的“游戏”,严重制约了大模型迭代的生产力。 +针对这一痛点,飞桨提出了动静统一自动并行方案。该技术通过原生动态图的编程界面与自动并行能力,同时保障了灵活性和易用性,大幅降低了大模型并行训练的开发成本;同时,利用框架动静统一的优势,一键转静使用静态优化能力,提供极致的大模型并行训练性能。开发者仅需少量的张量切分标注,框架便能自动推导出所有张量和算子的分布式切分状态,并添加合适的通信算子,保证结果正确性;最后会根据模型结构和集群信息,结合显存、调度层优化,自动寻找最高效的分布式并行策略。具体工作流程如下图所示: -为了实现深度学习框架的上述特性,我们必须对框架的架构进行精心设计,确保其能够支持各种复杂的模型构建,同时与多样化的芯片实现无缝对接。接下来,我们将通过直观的架构图,详细展示飞桨新一代框架内所涵盖的功能模块,以及这些模块之间的相互作用与联系。以下为飞桨框架 3.0 的架构图。 +
+ +
+飞桨动静统一自动并行技术的具体特点如下: +* 全面可用,适用于众多大模型训练场景。配合飞桨大模型开发套件(PaddleNLP、PaddleMIX),飞桨框架支持如Llama、DeepSeek、QwenVL等主流模型的预训练、精调阶段的训练流程和并行策略。 +* 简单易用,大幅降低大模型并行训练开发成本。飞桨自动并行功能允许用户在不考虑并行训练技巧的情况下完成算法实现。仅需借助少量API调用,即可将算法转换为并行训练程序,显著简化开发过程。以Llama2的预训练为例,传统实现方式需要开发者精细调整通信策略,以确保正确高效执行,而自动并行实现方式相比传统方式减少80%的分布式核心代码,极大降低了开发复杂度。 +* 轻松加速,一键动转静提供极致性能优化。在构建基础并行组网后,飞桨的自动并行架构进一步融入了运行时静态优化的强大功能。得益于飞桨框架独特的动静统一设计,用户仅需简单添加一行代码,即可轻松实现从动态到静态的转换。这一转换使得我们能够充分利用多种静态优化技术,显著提升训练效率,最终的训练性能不仅能够匹敌,甚至超越经过极致优化的动态图表现。
- +
-飞桨框架对外提供了丰富的深度学习相关的各种开发接口,如张量表示、数学计算、模型组网、优化策略等。通过这些接口,开发者能够便捷地构建和训练自己的深度学习模型,无需深入到底层的技术细节中去。 +* 协同文心,开源多项大模型独创优化策略。飞桨创新多项如精细化重计算,稀疏注意力计算优化、通信分组优化和灵活批次的流水线均衡优化等技术,大幅提升文心训练性能,同时,大模型优化策略的相关能力也开源在飞桨3.0框架中,助力开发者使用飞桨进行极致的大模型训练性能优化。 +未来,我们将进一步探索无需使用张量切分标记的全自动并行,让开发者可以像写单机代码一样写分布式代码,进一步提升大模型的开发体验。 -在开发接口之下,飞桨框架可以划分为 4 个层次:表示层、调度层、算子层和适配层。 +关于自动并行功能的更多介绍,请参考以下文档:[《自动并行训练》](./auto_parallel_cn.md) -- 表示层专注于计算图的表达与转换,通过高可扩展中间表示 PIR,为动转静(动态图转为静态图)、自动微分、自动并行、组合算子以及计算图优化等核心功能提供坚实支撑。 -- 调度层则负责对代码或计算图进行智能编排与高效调度,并且能够根据实际需求进行显存和内存的管理优化,支持动态图和静态图高效执行。无论开发者选择使用动态图还是静态图进行模型开发,飞桨框架都能提供高效的执行环境,同时确保资源利用的最优化。 -- 算子层由神经网络编译器 CINN 和算子库 PHI 共同构成,涵盖了张量定义、算子定义、算子自动融合和算子内核实现等关键功能。 -- 适配层则用于实现与底层芯片适配,包括设备管理、算子适配、通信适配以及编译接入等功能。 +## 二、大模型训推一体,提升推理部署效率 -## 四、动静统一自动并行 +在完成模型的开发和训练后,接下来我们需要考虑推理部署场景所面临的挑战。如何低门槛、低开发成本、快速地将模型部署到业务场景,并提供低时延、高吞吐、低算力成本的推理服务。自2.0版本起,飞桨便采用了“动静统一、训推一体”的设计理念,3.0版本也继续秉持这一理念。 +在推理部署方面,相较于动态图,静态图不仅可部署范围更为广泛,它够通过整图导出的方式,摆脱对Python源代码和执行环境的依赖;而且更适合进行全局调优,可通过手写或者借助编译器自动实现算子融合等方式来加速推理过程。 +得益于动静统一的架构和接口设计,飞桨能够完整支持动态图和静态图这两种不同的运行模式,并且具备出色的整图导出能力。飞桨的动转静整图导出成功率高达95%,高于PyTorch的62%。“训推一体”意味着能够在同一套框架下,尽可能复用训练和推理的代码,特别是复用模型组网代码。在完成模型的开发训练后,只需进行少量的开发工作,即可实现快速推理部署。与业界当前先使用PyTorch和DeepSpeed进行训练,再采用vLLM、SGLang、ONNXRuntime等推理引擎进行推理部署的方案相比,飞桨采用训练和推理使用同一套框架的方式,能够有效避免不同框架之间可能出现的版本兼容性问题,以及因模型结构变化、中间表示差异、算子实现差异等带来的困扰。 -### 4.1 动静统一 +
+ +
-我们来回顾下飞桨框架所提供的静态图和动态图两种开发模式。这两种模式在模型组网阶段的代码是完全一致的,因此我们称之为动静统一的组网方式。然而,它们之间的主要差异体现在计算图的构建和执行过程中。在静态图开发模式下,一旦计算图被创建,它将保持不变。这意味着,在运行阶段,不能再根据输入的计算数据作为判断条件,来调整计算图。相反,在动态图开发模式下,每当输入新的数据批次时,计算图会动态地生成和执行。这种灵活性使得动态图模式在现代深度学习任务中备受欢迎。然而,尽管动态图模式具有诸多优势,但也存在一个问题:由于计算图会频繁地创建和执行,这使得对其进行优化变得相当困难。特别是在推理部署场景下,动态图模式往往难以摆脱对 Python 解释器的依赖进行部署。而 Python 解释器的引入,在某些场景下,比如对性能要求很高的大模型推理部署场景或者资源受限的端侧场景,可能会导致效率低下或无法使用。为了克服这一难题,飞桨研发了动静转换技术,通过简单的一行命令(to_static),便能够将动态图的代码轻松转换为静态图代码。 +大模型的推理部署需要更好地平衡成本、性能和效果,飞桨框架3.0全面升级了大模型推理能力,依托高扩展性的中间表示(PIR)从模型压缩、推理计算、服务部署、多硬件推理全方位深度优化,能够支持众多开源大模型进行高性能推理,并在DeepSeek V3/R1上取得了突出的性能表现。飞桨框架3.0支持了DeepSeek V3/R1满血版及其系列蒸馏版模型的FP8推理,并且提供INT8量化功能,破除了Hopper架构的限制。此外,还引入了4比特量化推理,使得用户可以单机部署,降低成本的同时显著提升系统吞吐一倍,提供了更为高效、经济的部署方案。在性能优化方面,我们对MLA算子进行多级流水线编排、精细的寄存器及共享内存分配优化,性能相比FlashMLA最高可提升23%。综合FP8矩阵计算调优及动态量化算子优化等基于飞桨框架3.0的DeepSeek R1 FP8推理,单机每秒输出token数超1000;若采用4比特单机部署方案,每秒输出token数可达2000以上!推理性能显著领先其他开源方案。此外,还支持了MTP投机解码,突破大批次推理加速,在解码速度保持不变的情况下,吞吐提升144%;吞吐接近的情况下,解码速度提升42%。针对长序列Prefill阶段,通过注意力计算动态量化,首token推理速度提升37%。 -飞桨采用的技术方案是源代码到源代码的转换,即分析并转写动态图 Python 源代码,进而生成对应的静态图 Python 源代码;在获取源代码后,使用静态 Python 解释器来执行这段静态图代码,从而得到计算图表示。动静转换技术的核心挑战在于对 Python 语法的支持程度。通过实际测试,我们发现飞桨对 Python 语法的支持率高达 94%,飞桨的动静转换功能在整图导出任务的成功率高达 95%。飞桨框架的优势在于它同时兼容动态图和静态图两种开发模式。因此,在进行动静转换时,仅需实现从动态图 Python 源代码到静态图 Python 源代码的转换。这一转换过程可以通过 Python 解释器进一步增强对 Python 语法的支持,从而大大降低了实现的难度。在训练场景,针对那些无法进行动静转换的情况,例如 Python 代码中调用 NumPy 等第三方库时,这些库的函数调用无法直接转换为静态图表示。为了解决这一问题,飞桨创新性地研发了“自适应图构建机制”。当遇到不支持的语法时,该机制会被触发,自动断开这些部分,并利用前后相邻的图进行重新构建。通过采用这种方案,我们在训练场景中可以实现 100% 的动静转换成功率,从而为编译器等计算图优化技术提供了更广阔的空间。更多关于动静转换的信息,请参考以下链接:[《动转静 SOT 原理及使用》](./sot_cn.md) +
+ +
-### 4.2 自动并行 -在大模型场景中,分布式训练必不可少。当前用户使用 Megatron、DeepSpeed 等动态图手动并行框架开发分布式策略时,往往需要精心处理计算、通信、调度等多元逻辑,才能编写出正确的分布式代码,这无疑提高了开发的难度。为了解决这一难题,我们提出了动静统一的自动并行方案。在自动并行的编程范式下,开发者只需要在单卡模型组网的基础上提供集群声明以及少量的标记信息,飞桨框架可以根据模型结构和集群信息自动寻找合适的分布式训练策略。 +## 三、助力科学前沿探索,提升微分方程求解速度 -在做分布式标记时,我们使用 ProcessMesh 将一个设备(比如一块 GPU 卡)映射为一个进程,将多个设备映射为多个进程组成的一维或多维数组。下图展示了由 8 个设备构成的两种不同 ProcessMesh 抽象表示。 +人工智能正以前所未有的方式重塑科学研究范式,成为推动科学发现与技术创新的“超级加速器”。例如,布朗大学团队首次提出物理信息神经网络(PINNs),通过自动微分实现物理约束与数据驱动的结合;NVIDIA实验室提出全球高分辨率气象预报模型FourCastNet,预报时长从几个小时缩短到几秒钟;2025年1月,Baker团队在《Nature》发表研究,利用RFdiffusion算法从头设计出能够高效中和眼镜蛇蛇毒中三指毒素的蛋白质。科学智能(AI for Science)为解决科学问题带来新方法的同时,也对深度学习框架带来诸多新挑战。对科学问题机理化的探索,需要深度学习框架能够具备更加丰富的各类计算表达能力,如高阶自动微分、傅里叶变换、复数运算、高阶优化器等等;此外,如何实现深度学习框架与传统科学计算工具链的协同,也是需要思考的问题。 +为了解决这些挑战,飞桨框架3.0提出了基于组合算子的高阶自动微分技术,如下图所示,该技术的核心思想是将复杂算子(如 log_softmax)拆解为多个基础算子的组合,然后对这些基础算子进行一阶自动微分变换。重要的是,基础算子经过一阶自动微分变换后,其所得的计算图仍然由基础算子构成。通过反复应用一阶自动微分规则,我们可以轻松地获得高阶自动微分的结果。这一机制不仅完美兼容动态图模式和静态图模式,而且在动态图模式下支持 N+1 阶微分的灵活拆分,同时在静态图模式下能够进行高效的编译器融合优化。 +更多关于高阶自动微分和 AI for Science 的信息,请参考文档:[《高阶自动微分功能》](./higher_order_ad_cn.md)。
- +
-然后通过使用 Placement 来表示张量在不同设备上的切分状态,Placement 分为 Replicate、Shard 和 Partial 这 3 种切分状态。如下图所示,Replicate 表示张量在不同设备上会以复制的形式存在;Shard 表示按照特定的维度在不同设备上进行切分;Partial 表示设备上的张量不完整,需要进行 Reduce Sum 或者 Reduce Mean 等不同方式的操作后,才能得到完整的状态。 +基于飞桨框架的高阶自动微分和编译优化技术,实现了方程求解类模型性能的大幅提升,英伟达Modulus的41个不同方程实验显示,飞桨的微分方程求解速度比PyTorch开启编译器优化后的2.6版本平均快 115%。此外,飞桨还实现了傅里叶变换、复数运算、高阶优化器等功能,这些方法在航空航天、汽车船舶、气象海洋、生命科学等多个领域都具有广泛的应用潜力,为科学研究和工程实践提供了有力的支持。在模型层面,我们成功研发了赛桨(PaddleScience)、螺旋桨(PaddleHelix)等系列开发套件,为科学计算提供了更为便捷、高效的解决方案。飞桨对DeepXDE、Modulus 等主流开源科学计算工具进行了广泛适配,并成为 DeepXDE 的默认推荐后端。
- +
-在完成分布式标记抽象后,通过调用`paddle.distributed.shard_tensor()`接口,将一个普通的张量标记成分布式张量。标记出分布式张量后,我们可以像写单卡程序一样,调用算子对分布式张量进行操作。框架底层会根据算子的计算逻辑自动进行必要的数据切分、并行计算和通信操作,以保证分布式计算结果的正确性。 - -自动并行提供了一种高度灵活和方便的张量切分标记方式,可以轻松地实现各种复杂的分布式并行策略。以具体的 Llama 模型训练为例,动态图手动并行的开发方式,它要求开发者不仅要选择合适的并行策略,还必须精心设计通信逻辑;通过采用自动并行的开发方式,开发者无需再考虑复杂的通信逻辑。其分布式训练核心代码量可减少 50%,从而大大降低了开发的难度。未来,我们将进一步探索无需使用张量切分标记的全自动并行,让开发者可以完全不用关心集群拓扑和分布式标记信息,进一步提升大模型的开发体验。关于自动并行功能的更多介绍,请参考以下文档:[《自动并行训练》](./auto_parallel_cn.md) - -## 五、神经网络编译器自动优化 +## 四、神经网络编译器技术,实现框架通用性能提升 -编译器(compiler)是一种计算机程序,负责将用某种编程语言编写的源代码(原始语言)转换成另一种编程语言(目标语言)。以高级编程语言编译器为例,如 gcc,它能将 C 语言代码转换成 CPU 可执行的机器指令。类似地,神经网络编译器,通常也被称为深度学习编译器,是深度学习领域特有的工具,用于将一种神经网络中间表示(IR)转换为另一种中间表示(IR)。例如,飞桨神经网络编译器 CINN,能够将神经网络中间表示转换为其他形式的中间表示,如 CUDA C 语言代码、SyCL 语言代码,或 LLVM IR。之后,利用芯片软件栈提供的编程语言编译器,比如英伟达的 NVCC(NVIDIA CUDA Compiler)编译器或 NVRTC(NVIDIA CUDA Runtime Compilation)运行时编译库,将这些中间表示进一步转换为可在英伟达 GPU 上运行的机器指令。 - -### 5.1 从 RMSNorm 说起 - -为什么在深度学习框架中需要引入编译器技术呢?让我们通过一个实例来阐释这一点。我们以 Llama 模型中经常使用的 RMS Normalization ([Root Mean Square Layer Normalization](https://arxiv.org/abs/1910.07467))为例,其计算公式相对简单明了。 +在众多深度学习的应用场景中,如大模型训练、自动驾驶等,对模型的训练与推理速度均提出了极高的要求。然而,要实现训练与推理速度的提升并非易事,这需要我们紧密结合模型结构与硬件特性,开展大量的工程实现与优化工作。在模型结构层面,模型结构正日益呈现出多样化的趋势,从基础的全连接网络,到复杂的卷积神经网络、循环神经网络、Attention网络、状态空间模型、图神经网络等,每一种模型结构都拥有其独特的计算模式与优化需求。在硬件特性方面,算力的增长速度远远超过了访存性能的提升,访存性能的瓶颈限制了访存密集型算子(如归一化层、激活函数等)的执行效率。特别是,当前市场上硬件平台种类繁多,我们需要投入大量的人力物力,进行针对性的优化工作,这将严重拖慢算法创新和产业应用的速度。 +让我们通过一个实例来阐释这一点。我们以 Llama 模型中经常使用的 RMS Normalization (Root Mean Square Layer Normalization)为例,其计算公式相对简单明了。
@@ -107,118 +110,51 @@ class RMSNorm(paddle.nn.Layer): return x * self.weight ``` -从上述代码中,我们可以清晰地观察到代码与公式之间存在着良好的对应关系。具体来说,代码中的`weight`变量对应于公式中的`g`,`x`变量对应于公式中的`a`。此外,代码中的`pow`函数实现了平方运算,`mean`函数对应公式中的求和取平均操作,而`rsqrt`函数则实现了开根号后取倒数的计算。这种编写方式赋予了代码极高的灵活性和可维护性,使得开发者可以像书写数学公式一样编写代码,从而大大降低了代码的理解成本和维护成本。如果开发者希望采用新的 Normalization 策略,只需简单地修改代码即可实现新的计算公式。 - -尽管这种实现方式非常灵活,但它也面临着一个极具挑战的问题,即执行速度较慢。特别是在处理大型模型时,由于计算量巨大且算力成本高昂,这一问题尤为突出。速度慢的主要原因在于,每一次函数调用都会触发飞桨框架底层的一次算子调用。而算子作为深度学习框架的最小调度和执行单元,在执行过程中需要将显存中的数据搬运到寄存器中进行运算,并将计算结果写回到显存中。这种频繁的显存读写操作导致了计算密度降低,在访存带宽有限的情况下,显著拖慢了程序的运行速度。 - -为了解决这一问题,最简单的方法是,增加一个叫 RMSNorm 的算子,并且提供一个叫 RMSNorm 的 Python 层 API,这一方法在飞桨框架 1.x 版本就可以支持,比如我们采用以下代码实现: - -```python -class RMSNorm(paddle.nn.Layer): - def __init__(self): - super().__init__() - self.variance_epsilon = 1e-6 - self.size = 768 - self.weight = paddle.create_parameter( - shape=[self.hidden_size], - dtype=paddle.get_default_dtype(), - default_initializer=nn.initializer.Constant(1.0), - ) - - def forward(self, x): - return paddle.incubate.nn.functional.fused_rms_norm( - x=x, - norm_weight=self.weight, - norm_bias=None, - epsilon=self.variance_epsilon, - begin_norm_axis=2, - ) -``` - -以上代码通过`fused_rms_norm`开发接口实现了对`rms_norm`的调用,这一改动带来了显著的性能提升,有效解决了之前版本运行速度慢的问题。然而,这一优化方案也带来了不少弊端。 - -最突出的是,它大大提高了开发者的门槛,因为开发者现在需要深入了解和掌握飞桨框架中关于张量、算子等核心概念,并熟悉算子开发与注册的全流程。此外,为了编写出性能优异的 reduce 求和操作,开发者还需精通 CUDA C 高性能程序开发的技巧,并对 Shared Memory、Warp Divergence、Bank Conflict 等高级概念有深刻的理解。 - -其次,该方案会增加框架开发接口的数量、并降低开发接口的可用性和可维护性。由于开发者的需求通常非常灵活多变,比如各种 Normalization 策略的变种,为每个特定操作同步增加一个 Python 开发接口,将导致框架的接口数量迅速增加,目前飞桨框架已经有接近 2000 个对外公开的开发接口。同时,随着算子融合粒度的增大,每个开发接口的参数数量也急剧上升,比如一些融合类的算子开发接口甚至可能包含多达 30 多个参数,这使得开发接口变得难以使用和维护。 - -再者,该方案的一个显著影响是导致框架算子库中的算子数量不断攀升,进而使得硬件适配的成本也随之增加。尽管飞桨框架已经对算子进行了清理和规范,且不考虑融合算子的情况,但当前飞桨框架的算子库仍然包含了超过 800 个算子,这为硬件适配工作带来了极大的挑战。这些新增加的算子需要使用 CUDA C 代码来实现,并且如果希望在其他类型的硬件上运行,还需要开发相应版本的代码。考虑到需要适配的硬件种类繁多,这无疑会大幅增加开发成本。 - -以下是截取的 CUDA C 代码实现片段,从中我们可以看出代码的实现变得复杂了许多。 - -```cpp - const ComputeType row_sum_square = - BlockAllReduce(thread_sum_square); - - // use multiply instead of divide. Author(zhengzekang). - ComputeType row_rms = row_sum_square * col_divisor; - ComputeType row_inv_rms = - Rsqrt(row_rms + static_cast(epsilon)); - // save for backward - if (inv_var_data != nullptr) { - inv_var_data[row] = row_inv_rms; - } - for (int pack_id = tid; pack_id < num_packs; pack_id += block_size) { - ComputeType pack[kPackSize]; -#pragma unroll - for (int i = 0; i < kPackSize; ++i) { - pack[i] = static_cast(buf[i * num_packs + pack_id]) * - row_inv_rms; - } - store.template store(pack, row, pack_id * kPackSize); - } -``` - -而借助神经网络编译器技术,我们能够在维持高度灵活性和易用性的基础上,实现性能的显著提升。以下 A100 平台上 RMSNorm 算子的性能测试结果便是一个明证:相较于采用 Python 开发接口组合实现的方式,经过编译优化后的算子运行速度提升了 4 倍;即便与手动算子融合的方式相比,也实现了 14%的性能提升。这一成果充分展示了飞桨框架在灵活性与性能之间寻找到的理想平衡点。 +上述代码开发简单,但是由于存在大量的访存操作导致性能很差,且显存占比较多;为了突破访存瓶颈,开发者可以选择通过手写CUDA代码的方式实现一个融合的 FusedRMSNorm算子,但是对于开发者要求更高,开发成本也更高,更重要的是这种方式极大的降低了可维护性和灵活性。 +为此,飞桨框架3.0研制了神经网络编译器CINN(Compiler Infrastructure for Neural Networks),相比于PyTorch 2.0的Inductor加Triton的两阶段编译方案,CINN支持直接从神经网络中间表述编译生成CUDA C代码,通过一阶段的编译方案,CINN避免了两阶段编译由于中间表示信息传递和表达能力限制所造成的信息损失,具备更通用的融合能力和更好的性能表现。具体一些技术创新如下: +1) 以 Reduce 为核心的算子融合技术。摒弃传统的粗粒度 pattern 匹配模式,支持维度轴自动变换对齐融合,在保证计算正确性的同时,具有更强的算子融合能力,带来更大的性能优化潜力。 +2) 动静态维度的高效后端 Kernel 调优技术。算子全面支持reduce、broadcast、transpose等多种算子的不同组合方式,针对各类算子组合和数据类型,自适应不同维度大小与不同硬件配置,进行全场景高效调优。通过自动向量化提高BF16、FP16等小数据类型的访存效率。通过分析与分桶机制,实现动静态运行时配置生成,根据运行时的硬件配置,在无需profiling的情况下生成高效的kernel。 +3) 动态维度的复杂表达式化简技术。建立了分层化简体系,Lower、Schedule、CodeGen 阶段执行不同等级化简方法,解决传统化简方法中多场景叠加后化简困难、化简不彻底问题。实现了复杂表达式结构化简,抽取融合算子经过编译、调优后的固定子结构进行专项化简,且灵活支持自定义化简方法。
- +
-### 5.2 飞桨神经网络编译器 CINN - -飞桨神经网络编译器 CINN 采用了与框架一体化的设计,其基础设施是基于飞桨的高扩展中间表示 PIR。这一设计使得 CINN 能够同时支持训练和推理过程,并且具备处理动态可变形状输入的能力。在生成式大语言模型 Llama 和文生图模型 Stable Diffusion 上的实验结果显示,通过使用编译器的优化技术,相较于未采用手动性能优化的基础版本,推理速度分别实现了 36%和 30%的提升。那么,编译器究竟是如何实现深度学习任务的加速呢?以下,我们将通过一个由 Add 和 Relu 算子组成的例子来具体展示这一过程。 +借助神经网络编译器技术,我们能够在维持高度灵活性和易用性的基础上,实现性能的显著提升。以下 A100 平台上 RMSNorm 算子的性能测试结果便是一个明证:相较于采用 Python 开发接口组合实现的方式,经过编译优化后的算子运行速度提升了 4 倍;即便与手动算子融合的方式相比,也实现了 14%的性能提升。这一成果充分展示了飞桨框架在灵活性与性能之间寻找到的理想平衡点。我们在PaddleX 开发套件里选取了超过 60 模型进行实验,使用 CINN 编译器后超 60% 模型有显著性能提升,平均提升达 27.4%。重点模型相比 PyTorch 开启编译优化后的版本平均快18.4%。
- +
-首先,该过程会利用组合算子机制,将原始的计算图拆解为由一系列基础算子构成的计算图,并在此过程中详细记录算子输入输出张量之间的形状关系,以确保其能够适应动态形状张量的复杂情况。随后,在神经网络编译器的前端部分,编译器会进行智能判断,识别出哪些基础算子具备融合潜力。对于这些可融合的基础算子,编译器会进一步调用基础的 Compute 函数,巧妙地将它们降级为由抽象语法树(AST)构成的低层中间表示(IR)。接下来,在神经网络编译器的后端部分,这些中间表示会被进一步精心转换成具体的代码实现,这既可能是 CUDA C 代码,也可能是 LLVM IR 代码,具体取决于目标平台的需求。最终,利用 NVCC 编译器或 LLVM 编译器,将这些代码转换成能够在芯片上高效运行的可执行代码,从而实现深度学习任务的显著加速。 - 更多关于神经网络编译器的信息,请参考文档[《神经网络编译器》](./cinn_cn.md)。 -## 六、高阶自动微分 +## 五、标准化统一硬件适配,加速软硬协同优化 -深度学习模型的训练过程,核心在于利用随机梯度下降(SGD)等优化算法来更新模型参数。在此过程中,深度学习框架的自动微分功能扮演着至关重要的角色,它基于链式法则自动计算出损失函数相对于模型参数的梯度。尽管在大多数深度学习任务中,仅需计算一阶导数,但在某些“AI for Science”的应用场景中,却需要计算高阶导数,这无疑大大增加了自动微分的复杂性。以 2D 矩形平板分布受载问题为例,该问题的内在机理需借助 4 阶微分方程来描述。因此,为了求解这类问题,深度学习框架必须提供高阶自动微分功能。然而,实现高阶自动微分面临着诸多挑战。具体来说,框架需要为每个算子编写高阶微分规则,而随着阶数的增加,这些微分规则的复杂性也随之上升。当阶数达到三阶或更高时,编写这些规则不仅变得极其困难,而且其正确性也难以保证。为了解决这一难题,我们提出了基于基础算子组合的高阶自动微分技术。该技术的核心思想是将复杂算子(如 log_softmax)拆解为多个基础算子的组合,然后对这些基础算子进行一阶自动微分变换。重要的是,基础算子经过一阶自动微分变换后,其所得的计算图仍然由基础算子构成。通过反复应用一阶自动微分规则,我们可以轻松地获得高阶自动微分的结果。 +深度学习框架在实现高效能计算的过程中,还面临着一个关键性挑战,即如何实现与各类硬件的有效适配。在深度学习的创新探索与产业落地进程中,单一芯片往往难以满足复杂多变的业务需求,因此通常需要融合运用多种芯片来构建解决方案。大模型应用对于算力的需求极为庞大,而单一芯片的供应数量有限,远不足以支撑大模型的高效运行。不仅如此,不同场景对芯片性能有着差异化的严苛要求,单一芯片更是难以全面满足。例如,在大模型训练场景中,需要芯片具备大显存、高带宽以及高可靠性的特性;自动驾驶场景则强调低时延与高可靠性,以保障行车安全;端侧场景则聚焦于低功耗,以延长设备的续航时间。 +飞桨自发布之初就考虑了多硬件适配的需求,历经持续迭代与演进,3.0版本构建了一套成熟且完善的多硬件统一适配方案: +* 首先,飞桨聚焦于硬件接口的抽象。飞桨将硬件接口细分为设备管理、计算执行、分布式通信等多个类别,通过标准化的硬件接口成功屏蔽了不同芯片软件栈开发接口之间的差异。通过合理的抽象,减少了适配所需的接口数量,以昇腾芯片适配为例,初步跑通所需适配接口数比PyTorch方案减少56%,适配代码量减少80%。 +* 其次,基于标准化适配接口的定义,飞桨实现了松耦合、可插拔的架构。在此架构下,每类芯片仅需提供标准化适配接口的具体实现,便能轻松融入飞桨后端,极大地简化了芯片接入的流程。 +* 再者,考虑到不同芯片软件栈成熟度的差异,飞桨提供了丰富多样的接入方式,涵盖算子开发、算子映射、图接入、编译器接入等。针对大模型训练与推理需求,飞桨还具备全栈优化能力,如支持动静统一编程范式、超大规模分布式训练技术,提高了模型开发与部署效率。 +* 最后,飞桨与芯片厂商携手合作,共同构建了官方代码合入机制、例行发版机制和持续集成测试等研发基础设施,还建立了日级别例行功能与精度监测,保障开发者使用体验。 +这些举措提升了研发效率,确保飞桨与各类芯片的适配工作高效、稳定推进。
- -
- -
- +
-为了全面支持高阶自动微分,飞桨框架精心设计与实现了一套组合算子机制。这一机制不仅完美兼容动态图模式和静态图模式,而且在动态图模式下支持 N+1 阶微分的灵活拆分,同时在静态图模式下能够进行高效的编译器融合优化。我们创新性地设计并实现了动静一体的算子组合规则,这意味着同一套组合规则在动态图和静态图两种模式下均可无缝复用,从而有效避免了重复开发的繁琐。在构建基础算子体系时,我们以 Tensor 作为核心操作对象,严格确保了算子的原子性、实用性和完备性。此外,我们还提供了自定义反向操作和自动重计算功能,这些强大的特性不仅显著提升了模型的精度,还有效地减少了显存占用,为用户带来了更高效、更灵活的深度学习体验。 - -基于前期的工作积累,飞桨已开始积极探索科学智能(AI for Science)领域的相关工作。为了满足 AI for Science 任务的多样化需求,飞桨在框架层面实现了基于组合算子的高阶自动微分功能,并专门提供了针对科学计算的开发接口。此外,我们还实现了高阶优化器,如 LBFGS 等,以进一步提升科学计算的性能。在模型层面,我们成功研发了赛桨(PaddleScience)、螺旋桨(PaddleHelix)等系列开发套件,为科学计算提供了更为便捷、高效的解决方案。飞桨对国内外主流开源科学计算工具进行了广泛适配,如 DeepXDE、Modulus 等,并成为国际主流的科学计算深度学习库 DeepXDE 的默认推荐后端。在与 NVIDIA 合作适配 Modulus 的过程中,我们充分利用飞桨框架的高阶自动微分与编译优化技术,实现了方程求解类模型性能的大幅优化。相比 Modulus 现有的后端求解速度,我们的平均提升幅度达到了 71%。我们实现了物理信息网络(PINN)、傅里叶算子学习(FNO)等数据驱动、机理驱动以及数据机理融合的方法。这些方法在航空航天、汽车船舶、气象海洋、生命科学等多个领域都具有广泛的应用潜力,为科学研究和工程实践提供了有力的支持。 - -更多关于高阶自动微分和 AI for Science 的信息,请参考文档:[《高阶自动微分功能》](./higher_order_ad_cn.md)。 - -## 七、高扩展中间表示 PIR +基于前述技术,飞桨与芯片厂商紧密合作,携手共建蓬勃发展的硬件生态,当前飞桨已与超过40家成员单位开展合作,适配超过60个芯片系列。飞桨已与22家硬件厂商伙伴达成深度合作,共同推出了飞桨生态发行版,这标志着双方合作迈向了新的高度。飞桨能够有效屏蔽底层硬件之间复杂多样的差异,为开发者提供简洁易用的开发接口。开发者只需编写一份代码,就可以让程序在不同芯片上顺畅运行,轻松实现业务的跨芯片迁移。这种卓越的跨平台能力,为业务在芯片选择方面带来了前所未有的灵活性,使开发者能够根据实际需求,更加自由、高效地规划业务部署。 -在通过动静转换技术获取计算图表示后,我们仍需对计算图进行一系列优化,如自动微分变换、分布式变换以及编译器加速等。为实现这些优化,我们需要一种“高扩展中间表示”PIR(Paddle Intermediate Representation)。PIR 具备灵活的基础组件,支持 Operation、Value、Attribute 等元素的定义,从而便于进行扩展。其中,Dialect 定义是 PIR 的核心组成部分,它类似于形式化语言中的一种表达,能够表示一个相对完整的体系,并支持开发者根据需求定制化扩展 Dialect,显著提升了框架的扩展性,这个体系涵盖了分布式、编译器、动态形状推理与控制流等多个方面。PIR 遵循 SSA(即 Static Single Assignment)原则,统一了顶层结构,实现“算子顺序性”和“计算图语义”的兼容表示。此外,PIR 还提供了更加简洁、低成本的 Pass 开发体系,并内置了一系列丰富且功能完备的 Pass 优化策略,为大模型的极致性能优化提供了强有力支撑。PIR 提供了 DRR 和 Pattern Rewriter 两种机制,以实现 IR 的灵活变化。为了验证 PIR 的有效性,我们比较了超过 900 个模型配置在使用 PIR 后的推理速度提升情况。结果显示,25%的模型推理速度提升了超过 30%,60%的模型提升了超过 10%。总体而言,使用 PIR 后,推理整体性能提升了超过 10%。这一显著提升主要归功于新 PIR 能够提前静态选择 Kernel,从而降低了调度成本和开销。此外,常量折叠策略的应用范围更广,Inplace Pass 策略机制也得到了更广泛的应用。采用新的 PIR 表示机制后,我们可以实现训推一体,展现出优异的性能和表现。更多关于 PIR 的信息,请参考文档:[《PIR 基本概念和开发》](./paddle_ir_cn.md)。 +## 总结 -## 八、多硬件适配 +总的来说,飞桨框架3.0面向大模型、异构多芯进行专属设计,向下适配异构多芯,充分释放硬件潜能;向上一体化支撑大模型的开发、训练、压缩、推理、部署全流程,并助力科学前沿探索。具备动静统一自动并行、大模型训推一体、科学计算高阶微分、神经网络编译器、异构多芯适配五大新特性。 +* 动静统一自动并行:用户只需在单卡基础上进行少量的张量切分标记,飞桨能自动寻找最高效的分布式并行策略,大幅度降低了产业开发和训练的成本,使开发者能够更专注于模型和算法的创新。 +* 大模型训推一体:同一套框架支持训练和推理,实现训练、推理代码复用和无缝衔接,为大模型的全流程提供了统一的开发体验和极致的训练效率,为产业提供了极致的开发体验。 +* 科学计算高阶微分:科学计算提供了高阶自动微分、复数运算、傅里叶变换、编译优化、分布式训练等能力支撑,支持数学、力学、材料、气象、生物等领域科学探索,微分方程求解速度比PyTorch开启编译器优化后的2.6版本平均快 115%。 +* 神经网络编译器:采用与框架一体化的设计,能够支持生成式模型、科学计算模型等多种模型的高效训练与可变形状推理,在计算灵活性与高性能之间提供了良好的平衡点,显著降低了性能优化的成本。 +* 异构多芯适配:构建了一套成熟且完善的多硬件统一适配方案,通过标准化接口屏蔽了不同芯片软件栈开发接口差异,实现可插拔架构,提供多种接入方式和基础设施,支撑硬件厂商合入4001个PR,包括26584个commits。 +综上所述,飞桨框架3.0将为开发者提供一个“动静统一、训推一体、自动并行、自动优化、广泛硬件适配”的深度学习框架,开发者可以像写单机代码一样写分布式代码,无需感知复杂的通信和调度逻辑,即可实现大模型的开发;可以像写数学公式一样用 Python 语言写神经网络,无需使用硬件开发语言编写复杂的算子内核代码,即可实现高效运行。目前3.0正式版本已面向开发者开放,并且兼容2.0版本的开发接口,非常欢迎广大开发者使用和反馈。 -深度学习框架在实现高效能计算的过程中,还面临着一个关键性挑战,即如何实现与各类硬件的有效适配。为了应对这一挑战,飞桨框架采取了全面的策略,并成功实现了多种不同的接入方式,以确保能够灵活满足不同芯片的适配需求。通过这些多样化的接入方法,飞桨框架不仅提升了深度学习应用的性能,还确保了广泛的硬件兼容性,从而为开发者提供了一个强大且灵活的工具,以适应不断变化的计算环境和需求。特别是针对大模型场景,飞桨提供了标准化硬件适配接口,只需要适配 30 余个接口,即可全面支持大模型训压推全流程;通过基础算子体系,减少硬件适配所需开发的算子数量;支持算子融合、显存复用等方式对大模型进行性能优化;支持通过神经网络编译器代码后端 CodeGen 的方式进行适配,实现算子自动融合和性能优化。 - -
- -
- -基于前述的先进技术,飞桨与芯片厂商携手,共同打造一个繁荣的硬件生态。这一过程可划分为三个核心阶段。首先是“共聚”阶段,我们联合多家芯片厂商,共同发起了飞桨硬件生态圈。其次是“共研”阶段,与芯片厂商携手实现软硬一体的联合优化。最后是“共创”阶段,与芯片厂商深度合作,共创繁荣生态。至今,我们已与 22 家硬件厂商伙伴成功联合推出了飞桨生态发行版,标志着合作的深入与成果的显现。同时,我们的生态圈已吸引超过 40 家成员单位加入,覆盖了主流硬件厂商,提供了极为丰富的硬件支持框架,为用户带来更加多样化的选择。 - -## 九、开始使用 +## 开始使用 接下来,欢迎大家使用飞桨框架 3.0 正式版,并给我们反馈。在开始使用前,确认已安装飞桨框架 3.0 正式版。下面,我们通过一个矩阵乘和 Softmax 组成的例子来展示飞桨新一代框架是如何实现动静统一自动并行和编译器自动优化性能的。具体代码如下所示: @@ -272,7 +208,7 @@ for data in loader(): print('loss', loss, flush=1) ``` -因为一些功能还在开发中,为了避免对用户造成干扰,当前我们没有默认开启高扩展中间表示 PIR 和神经网络编译器自动优化功能,在开始执行前,我们需要进行环境变量设置以确保新功能生效,如下: +由于在部分极端场景下编译器可能会引起性能退化,为了避免对用户造成干扰,当前我们没有默认开启神经网络编译器自动优化功能,在开始执行前,我们需要进行环境变量设置以确保新功能生效,如下: ```cpp # 打开组合算子 @@ -280,15 +216,6 @@ export FLAGS_prim_enable_dynamic=true && export FLAGS_prim_all=true # 打开 CINN 编译器相关 FLAG export FLAGS_use_cinn=true -export FLAGS_cinn_new_group_scheduler=true -export FLAGS_group_schedule_tiling_first=true -export FLAGS_cinn_bucket_compile=true - -# 打开 PIR 模式 -export FLAGS_enable_pir_api=true - -# 是否打印 Program IR 信息 -export FLAGS_print_ir=false # 执行命令 # python -u -m paddle.distributed.launch --gpus "0,1" test_demo.py diff --git a/docs/guides/paddle_v3_features/paddle_trt_cn.md b/docs/guides/paddle_v3_features/paddle_trt_cn.md new file mode 100644 index 00000000000..3785d0841c7 --- /dev/null +++ b/docs/guides/paddle_v3_features/paddle_trt_cn.md @@ -0,0 +1,210 @@ +# Paddle Inference(TensorRT子图引擎) + +- [Paddle Inference(TensorRT子图引擎)](#gpu-tensorrt-加速推理) + - [1. 概要](#1-概要) + - [2. 环境准备](#2-环境准备) + - [3. API 使用介绍](#3-api-使用介绍) + - [4. 低精度和量化推理](#4-低精度和量化推理) + - [5. Paddle Inference 适配 TensorRT 原理介绍](#5-paddle-inference-适配-tensorrt-原理介绍) + - [6. 基于pdmodel格式的旧架构 TensorRT 推理](#6-基于pdmodel格式的旧架构-TensorRT-推理) + + + +## 1. 概要 + +TensorRT 是一个针对 NVIDIA GPU 及 Jetson 系列硬件的高性能机器学习推理 SDK,可以使得深度学习模型在这些硬件上的部署获得更好的性能。Paddle Inference 以子图方式集成了 TensorRT,将可用 TensorRT 加速的算子组成子图供给 TensorRT,以获取 TensorRT 加速的同时,保留 PaddlePaddle 即训即推的能力。在这篇文章中,我们会介绍基于 Paddle3.0 中间表示(PIR)的TensorRT推理(PIR-TRT) + +PIR-TRT 功能实现主要由俩个部分组成,PIR-TRT 转换阶段和 PIR-TRT 推理阶段。在 PIR-TRT 转换阶段,原始模型(后缀为 **.json** 的模型文件)被加载后,神经网络被表示为由运算节点和其输入输出组成的 PIR 图,PIR-TRT Converter组件会对 PIR 图进行分析同时发现图中可以使用 TensorRT 优化的子图,并使用 TensorRT 节点替换它们,然后将带有 TensorRT 节点的图序列化下来。在模型的推理阶段,Paddle Inference 加载上述序列化后的模型,如果遇到 TensorRT 节点,Paddle Infenrence 会调用 TensorRT 对该节点进行执行,其它节点调用 GPU 原生推理。TensorRT 除了有常见的 OP 融合以及显存/内存优化外,还针对性地对 OP 进行了优化加速实现,降低推理延迟,提升推理吞吐。 + + +PIR-TRT 支持动态 shape 输入,动态 shape 可用于输入 size 任意变化的模型,如动态 shape 的图像模型(FCN, Faster-RCNN)、 NLP 的 Bert/Ernie 等模型,当然也支持包括静态 shape 输入的模型。 PIR-TRT 支持FP32、FP16、INT8 等多种计算精度,支持服务器端GPU,如T4、A30,也支持边缘端硬件,如 Jetson NX、 Jetson Nano、 Jetson TX2 等。 + + + +## 2. 环境准备 + +要支持 PIR-TRT 功能,需要安装 CUDA、cuDNN、TensorRT 和对应版本的 Paddle 安装包。 +关于这几个软件的安装版本,请参考如下建议(原因:CUDA、cuDNN、TensorRT 版本众多,且有严格的版本对应关系): + +- 电脑上 CUDA、cuDNN、TensorRT 都还没安装的开发者,建议使用 Paddle 提供的 [docker 镜像安装方式](https://www.paddlepaddle.org.cn/documentation/docs/zh/install/docker/linux-docker.html)。 +- 电脑上已安装 CUDA、cuDNN,但没有安装 TensorRT,建议参考 Paddle 提供的cuda、cudnn的对应版本的TensorRT版本去安装TensorRT。 +- 电脑上已安装 CUDA、cuDNN、TensorRT的开发者,去下载对应版本的 Paddle 安装包。 + - 如果 Paddle 安装包没有对应版本的,一种方式是按照 Paddle 提供的安装包信息重新安装CUDA、cuDNN、TensorRT,一种是自己源码编译对应电脑上 CUDA、cuDNN、TensorRT 版本的 Paddle 包。从工程难易程度,建议选择第一种方案。 + +如果您需要安装 [TensorRT](https://developer.nvidia.com/nvidia-tensorrt-8x-download),请参考 [TensorRT 文档](https://docs.nvidia.com/deeplearning/tensorrt/archives/index.html)。 + +关于 Paddle 的安装包,可以参考[Linux下PIP安装Paddle](https://www.paddlepaddle.org.cn/documentation/docs/zh/install/pip/linux-pip.html) + +关于 Paddle 源码编译,可以参考[Linux 下使用 make 从源码编译](https://www.paddlepaddle.org.cn/documentation/docs/zh/install/compile/linux-compile-by-make.html) + + +**Note:** + +1. 源码编译的时候,需要设置编译选项 WITH_TENSORRT 为 ON。另外可以设置编译选项 TENSORRT_ROOT 为 指定的 TensorRT SDK 的根目录,如果不设置将采用默认目录("/usr")。 +2. 请确保 Python 版本的 TensorRT 正确安装。如果是从源码编译 Paddle 安装包,你可以设置编译选项 WITH_PIP_TENSORRT 为 ON ,在安装 Paddle whl包的时候系统会自动搜寻默认目录下 C++ 版本 TensorRT SDK,并自动安装对应 Python 版本的 TensorRT。 +3. 当前 3.0 版本的 PIR-TRT 并不支持在 Windows 进行 TensorRT加速推理,如果需要在 Windows 上进行 TensorRT 加速,需要使用 Paddle 2.x 加速方式,请参考第6小节内容。 +4. 推荐使用的 TensorRT 的版本在 8.6 及以上,低于 8.5 版本的 TensorRT 功能将不可用。 + + + + +## 3. API 使用介绍 + +详细的 API文档请参考[Paddle-TensorRT接口类](https://www.paddlepaddle.org.cn/inference/v3.0/api_reference/python_api_doc/Paddle_TensorRT_interface.html) + +PIR-TRT 功能实现分为俩个步骤,即模型转换(convert)阶段和运行推理阶段。 + +模型convert阶段作用将原始 PIR 表示模型结构(后缀为.json的模型文件)转换为带有TensorRT能力的 PIR 表示模型结构,在这个阶段中,Converter组件会对 PIR 图进行分析同时发现图中可以使用 TensorRT 优化的子图,并使用 TensorRT 节点替换它们,然后将带有 TensorRT 节点的图序列化下来。一个典型的convert阶段代码如下所示: + +```python + import numpy as np + import paddle + import paddle.nn.functional as F + from paddle import nn + from paddle.tensorrt.export import Input, TensorRTConfig + + class LinearNet(nn.Layer): + def __init__(self, input_dim): + super().__init__() + self.linear = nn.Linear(input_dim, input_dim) + + def forward(self, x): + return F.relu(self.linear(x)) + + input_dim = 3 + # 1.Instantiate the network. + layer = LinearNet(input_dim) + + save_path = "/tmp/linear_net" + # 2.Convert dynamic graph to static graph and save as a JSON file. + paddle.jit.save(layer, save_path, [paddle.static.InputSpec(shape=[-1, input_dim])]) + + # 3.Create TensorRTConfig + input_config = Input( + min_input_shape=[1, input_dim], + optim_input_shape=[2, input_dim], + max_input_shape=[4, input_dim] + ) + + trt_config = TensorRTConfig(inputs=[input_config]) + trt_config.save_model_dir = "/tmp/linear_net_trt" + + # 4.Perform TensorRT conversion + paddle.tensorrt.convert(save_path, trt_config) + +``` + +示例中,步骤1和2过程是准备一个用来跑 TensorRT 加速推理的模型,这里创建了一个简单的动态图模型并且使用[动转静](https://www.paddlepaddle.org.cn/documentation/docs/zh/guides/jit/index_cn.html)方式保存下来为后续推理使用。步骤3创建了一个TensorRTConfig,用来给 TensorRT 做一些基础设置,这里Input设置了运行 TensorRT 所必须的输入min/opt/max shape,save_model_dir用于指定了convert后模型保存的路径。 + +在运行推理阶段,主要是通过使用convert后的模型进行推理,来获得 TensorRT 加速效果。在[上一节](https://www.paddlepaddle.org.cn/inference/v3.0/guides/nv_gpu_infer/gpu_native_infer.html)中,我们了解到 Paddle Inference 推理简介(对 Paddle Inference 不熟悉请参考[这里](https://www.paddlepaddle.org.cn/inference/v3.0/guides/introduction/index_intro.html)包含了以下六步: + +- 导入包 +- 设置 Config +- 创建 Predictor +- 准备输入 +- 执行 Predictor +- 获取输出 + +Paddle Inference 中推理阶段使用 TensorRT 加速也是遵照这样的流程,仅仅需要将 Config 中的加载的模型替换为上一步我们convert后保存的模型即可。示例代码如下: + + +```python + import paddle + import numpy as np + import paddle.inference as paddle_infer + + # 5.Create a Predictor and run TensorRT inference. + config = paddle_infer.Config( + '/tmp/linear_net_trt.json', + '/tmp/linear_net_trt.pdiparams', + ) + config.enable_use_gpu(100, 0) + predictor = paddle_infer.create_predictor(config) + + input_data = np.random.randn(2, 3).astype(np.float32) + model_input = paddle.to_tensor(input_data) + + output_converted = predictor.run([model_input]) +``` + + + + +## 4. 低精度和量化推理 + +深度学习模型训练好之后,其权重参数在一定程度上是冗余的,在很多任务上,我们可以采用低精度或量化进行模型推理而不影响模型精度。这一方面可以减少访存、提升计算效率,另一方面,可以降低显存占用。采用 TensorRT 加速推理的方式也可支持 FP32、FP16 以及 INT8 量化推理。使用前,请参考[链接](https://docs.nvidia.com/deeplearning/tensorrt/support-matrix/index.html#hardware-precision-matrix)确保您的 GPU 硬件支持您使用的精度。 + + + + +### FP16 推理 + +为了使用 TensorRT 利用半精度进行混合精度推理,需将制定精度类型参数设定为半精度。 +以第三节中的代码示例为例子,只需要对```TensorRTConfig```设置```precision_mode```,便可开启 FP16 推理。 +```python +from paddle.tensorrt.export PrecisionMode + +trt_config.precision_mode = PrecisionMode.FP16 +``` + + + + +### INT8 量化推理 + +使用 INT8 量化推理的流程可以分为两步:(1)产出量化模型。(2)使用量化模型进行 TensorRT 加速推理。下面我们对使用 PIR-TRT 进行 INT8 量化推理的完整流程进行详细介绍。 + +**1. 产出量化模型** + +目前,PIR-TRT 支持模型压缩工具库 PaddleSlim 产出的量化模型。PaddleSlim 支持离线量化和在线量化功能。离线量化的优点是无需重新训练,简单易用,但量化后精度可能受影响;量化训练的优点是模型精度受量化影响较小,但需要重新训练模型,使用门槛稍高。具体使用PaddleSlim产出量化模型可以参考文档: + + - 离线量化 [快速开始教程](https://github.com/PaddlePaddle/PaddleSlim/blob/release/2.6/docs/zh_cn/quick_start/static/quant_post_static_tutorial.md) + - 离线量化 [API 接口说明](https://github.com/PaddlePaddle/PaddleSlim/blob/release/2.6/docs/zh_cn/api_cn/static/quant/quantization_api.rst) + - 离线量化 [Demo](https://github.com/PaddlePaddle/PaddleSlim/tree/release/2.6/demo/quant/quant_post) + - 量化训练 [快速开始教程](https://github.com/PaddlePaddle/PaddleSlim/blob/release/2.6/docs/zh_cn/quick_start/dygraph/dygraph_quant_aware_training_tutorial.md) + - 量化训练 [API 接口说明](https://github.com/PaddlePaddle/PaddleSlim/blob/release/2.6/docs/zh_cn/api_cn/dygraph/quanter/qat.rst) + - 量化训练 [Demo](https://github.com/PaddlePaddle/PaddleSlim/tree/release/2.6/demo/quant/quant_aware) + +如果想尝试快速使用 PaddleSlim 量化好的推理模型请参考[自动化压缩工具](https://github.com/PaddlePaddle/PaddleSlim/tree/develop/example/auto_compression)。 + +**2. 使用量化模型进行 TensorRT INT8 推理** + + +为了加载量化模型进行 TensorRT INT8 推理,需要在指定 TensorRT 配置时,对```TensorRTConfig```设置```precision_mode```,PIR-TRT 其他流程不需要变,便可开启 INT8 推理 +```python +from paddle.tensorrt.export PrecisionMode + +trt_config.precision_mode = PrecisionMode.INT8 +``` + + + + +## 5. Paddle Inference 适配 TensorRT 原理介绍 + +PIR-TRT 采用子图的形式对 TensorRT 进行集成,当模型加载后,神经网络可以表示为由运算节点及其输入输出组成的 PIR 计算图。PIR-TRT 对整个图进行扫描,发现图中可以使用 TensorRT 优化的子图,并使用 TensorRT 节点替换它们。在模型的推断期间,如果遇到 TensorRT 节点,Paddle Inference 会调用 TensorRT 库对该节点进行优化,其他的节点调用 Paddle Infenrence 的 GPU 原生实现。TensorRT 在推断期间能够进行 Op 的横向和纵向融合,过滤掉冗余的 Op,并对特定平台下的特定的 Op 选择合适的 Kernel等进行优化,能够加快模型的推理速度。 + +下图使用一个简单的模型展示了这个过程: + +**原始网络** + +![model_original](./images/paddle-trt/model_original.png) + +**转换的网络** + +![model_trt](./images/paddle-trt/model_trt.png) + +原始网络是由matmul,add,relu等算子组合成的一个简单网络。PIR-TRT 会对网络进行检测并将matmul,add,relu等算子作为一个可转换子图选出来,由一个 TensorRT 节点代替,成为转换后网络中的 **tensorrt_engine** 节点,并且在该节点之前添加一个 combine 节点,方便将输入汇总传给 tensorrt_engine,在该节点之后添加一个 split 节点,方便将输出分发给其他节点。在网络运行过程中,如果遇到tensorrt_engine,Paddle Inference 将调用 TensorRT 来对其执行。 + + + + +## 6. 基于pdmodel模型格式的 TensorRT 推理 + +在Paddle 3.0 版本之后,飞桨底层升级为全新的 PIR 架构,保存的模型结构以.json后缀的模型为主。虽然 3.0 进行了全面升级,但是出于兼容性考虑依然保留着旧架构的功能。 + +如果是使用 Paddle 3.0 新架构下保存的模型(.json后缀)进行 TensorRT 推理加速,则需要参考本章节介绍的 PIR-TRT 使用方法。 + +如果想在 Paddle 3.0 下对于保存的模型后缀为.pdmodel格式进行 TensorRT 加速推理,可以参考 [Paddle 2.x TensorRT 推理文档](https://www.paddlepaddle.org.cn/inference/v2.6/guides/nv_gpu_infer/gpu_trt_infer.html)。 + +如果想在 Paddle 3.0 下对于保存的模型后缀为.pdmodel格式进行 TensorRT 低精度加速推理,可以参考 [Paddle 2.x TensorRT 低精度推理文档](https://www.paddlepaddle.org.cn/inference/v2.6/guides/nv_gpu_infer/trt_fp16_int8.html)。 \ No newline at end of file diff --git a/docs/hardware_support/dcu/install_cn.md b/docs/hardware_support/dcu/install_cn.md index a32f38adb58..fdc3e9ae2b1 100644 --- a/docs/hardware_support/dcu/install_cn.md +++ b/docs/hardware_support/dcu/install_cn.md @@ -14,21 +14,23 @@ ## 运行环境准备 -推荐使用飞桨官方发布的海光 DCU 开发镜像,该镜像预装有海光 DCU 基础运行环境库(DTK)。 +您可以基于 docker、pip、源码等不同方式准备飞桨开发环境 + +### 基于 Docker 的方式(推荐) + +我们推荐使用飞桨官方发布的海光 DCU 开发镜像,该镜像预装有海光 DCU 基础运行环境库(DTK)和飞桨 3.0 版本的 SDK。 ```bash # 拉取镜像 -docker pull ccr-2vdh3abv-pub.cnc.bj.baidubce.com/paddlepaddle/paddle-dcu:dtk24.04.1-kylinv10-gcc82 +docker pull ccr-2vdh3abv-pub.cnc.bj.baidubce.com/device/paddle-dcu:3.0.0-dtk24.04.1-kylinv10-gcc82-py310 ``` ```bash # 启动容器 docker run -it --name paddle-dcu-dev -v $(pwd):/work \ -w=/work --shm-size=128G --network=host --privileged \ - --device=/dev/kfd --device=/dev/dri --ipc=host --group-add video \ - -u root --ulimit stack=-1:-1 --ulimit memlock=-1:-1 -v /opt/hyhal:/opt/hyhal \ --cap-add=SYS_PTRACE --security-opt seccomp=unconfined \ - ccr-2vdh3abv-pub.cnc.bj.baidubce.com/paddlepaddle/paddle-dcu:dtk24.04.1-kylinv10-gcc82 /bin/bash + ccr-2vdh3abv-pub.cnc.bj.baidubce.com/device/paddle-dcu:3.0.0-dtk24.04.1-kylinv10-gcc82-py310 /bin/bash ``` #### 选项说明及可调整参数 @@ -67,26 +69,18 @@ DCU Temp AvgPwr Fan Perf PwrCap VRAM% DCU% ===================End of SMI Log=================== ``` -## 安装飞桨框架 - -**注意**:飞桨框架 DCU 版仅支持海光 C86 架构。 - -### 安装方式一:wheel 包安装 - -在启动的 docker 容器中,下载并安装飞桨官网发布的 wheel 包。 +### 基于 pip 安装的方式 ```bash # 下载并安装 wheel 包 -python -m pip install --pre paddlepaddle-dcu -i https://www.paddlepaddle.org.cn/packages/nightly/dcu/ +python -m pip install paddlepaddle-dcu==3.0.0 -i https://www.paddlepaddle.org.cn/packages/stable/dcu/ ``` -⚠️ 注意:nightly 版本为每日构建,可能存在不稳定性。如果需要更稳定的版本,建议使用 3.0-rc 版本。 -### 安装方式二:源代码编译安装 -在启动的 docker 容器中,下载 Paddle 源码并编译,CMAKE 编译选项含义请参见[编译选项表](https://www.paddlepaddle.org.cn/documentation/docs/zh/develop/install/Tables.html#Compile)。 +### 基于源码编译的方式 ```bash # 下载 Paddle 源码 -git clone https://github.com/PaddlePaddle/Paddle.git -b develop +git clone https://github.com/PaddlePaddle/Paddle.git -b release/3.0 cd Paddle # 创建编译目录 @@ -102,12 +96,12 @@ cmake .. -DCMAKE_EXPORT_COMPILE_COMMANDS=ON -DCMAKE_CXX_FLAGS="-Wno-error -w" \ make -j16 # 编译产出在 build/python/dist/ 路径下,使用 pip 安装即可 -pip install -U paddlepaddle_dcu-0.0.0-cp310-cp310-linux_x86_64.whl +python -m pip install -U paddlepaddle_dcu-*-linux_x86_64.whl ``` -⚠️ 注意:nightly 版本为每日构建,可能存在不稳定性。如果需要更稳定的版本,建议使用 3.0-rc 版本。 + ## 基础功能检查 -安装完成后,在 docker 容器中输入如下命令进行飞桨基础健康功能的检查。 +输入如下命令进行飞桨基础健康功能的检查。 ```bash # 检查当前安装版本 @@ -115,10 +109,20 @@ python -c "import paddle; paddle.version.show()" ``` ```bash # 预期得到输出如下 -commit: d37bd8bcf75cf51f6c1117526f3f67d04946ebb9 +full_version: 3.0.0 +major: 3 +minor: 0 +patch: 0 +rc: 0 cuda: False cudnn: False nccl: 0 +xpu_xre: False +xpu_xccl: False +xpu_xhpc: False +cinn: False +tensorrt_version: None +cuda_archs: [] ``` ```bash # 飞桨基础健康检查 @@ -131,11 +135,3 @@ PaddlePaddle works well on 1 GPU. PaddlePaddle works well on 8 GPUs. PaddlePaddle is installed successfully! Let's start deep learning with PaddlePaddle now. ``` - -## 如何卸载 - -请使用以下命令卸载: - -```bash -pip uninstall paddlepaddle-dcu -``` diff --git a/docs/hardware_support/dcu/paddle_tutorial_cn.md b/docs/hardware_support/dcu/paddle_tutorial_cn.md index 38bc0ee0488..808bce52336 100644 --- a/docs/hardware_support/dcu/paddle_tutorial_cn.md +++ b/docs/hardware_support/dcu/paddle_tutorial_cn.md @@ -8,20 +8,10 @@ * 考虑到环境差异性,我们推荐使用教程提供的标准镜像完成环境准备: - * 镜像链接:ccr-2vdh3abv-pub.cnc.bj.baidubce.com/paddlepaddle/paddle-dcu:dtk24.04.1-kylinv10-gcc82 + * 镜像链接:ccr-2vdh3abv-pub.cnc.bj.baidubce.com/device/paddle-dcu:3.0.0-dtk24.04.1-kylinv10-gcc82-py310 -### 环境安装 +* 镜像中默认装有 3.0 版本的 PaddlePaddle -安装 PaddlePaddle - -*该命令会自动安装飞桨主框架每日自动构建的 nightly-build 版本* - -*由于 dcu 代码位于飞桨主框架中,因此我们不需要安装额外的 Custom Device 包* - -```shell -python -m pip install --pre paddlepaddle-dcu -i https://www.paddlepaddle.org.cn/packages/nightly/dcu/ -``` -⚠️ 注意:nightly 版本为每日构建,可能存在不稳定性。如果需要更稳定的版本,建议使用 3.0-rc 版本。 ## 二、运行示例 飞桨框架集成了经典的视觉模型用于帮助用户快速上手,我们将基于 ResNet50 结构,在 Cifar10 数据集上进行一次快速训练,用于帮助您了解如何基于海光 DCU 进行训练 diff --git a/docs/hardware_support/dcu/paddlex_tutorial_cn.md b/docs/hardware_support/dcu/paddlex_tutorial_cn.md index 4bd1030ed2a..041d4c40c54 100644 --- a/docs/hardware_support/dcu/paddlex_tutorial_cn.md +++ b/docs/hardware_support/dcu/paddlex_tutorial_cn.md @@ -8,19 +8,11 @@ * 考虑到环境差异性,我们推荐使用教程提供的标准镜像完成环境准备: - * 镜像链接:ccr-2vdh3abv-pub.cnc.bj.baidubce.com/paddlepaddle/paddle-dcu:dtk24.04.1-kylinv10-gcc82 + * 镜像链接:ccr-2vdh3abv-pub.cnc.bj.baidubce.com/device/paddle-dcu:3.0.0-dtk24.04.1-kylinv10-gcc82-py310 ### 环境安装 -1. 安装 PaddlePaddle - -*该命令会自动安装飞桨主框架每日自动构建的 nightly-build 版本* - -*由于 dcu 代码位于飞桨主框架中,因此我们不需要安装额外的 Custom Device 包* - -```shell -python -m pip install --pre paddlepaddle-dcu -i https://www.paddlepaddle.org.cn/packages/nightly/dcu/ -``` +1. 镜像中默认装有 3.0 版本的 PaddlePaddle,无需额外安装 2. 安装 PaddleX 代码库 @@ -36,7 +28,7 @@ cd PaddleX # -e:以可编辑模式安装,当前项目的代码更改,都会直接作用到已经安装的 PaddleX Wheel pip install -e . ``` -⚠️ 注意:nightly 版本为每日构建,可能存在不稳定性。如果需要更稳定的版本,建议使用 3.0-rc 版本。 + ## 基于 PaddleX 训练 ResNet50 ### 一、安装 PaddleX 依赖 diff --git a/docs/hardware_support/gcu/install_cn.md b/docs/hardware_support/gcu/install_cn.md index eeabc7251bb..97134279b61 100644 --- a/docs/hardware_support/gcu/install_cn.md +++ b/docs/hardware_support/gcu/install_cn.md @@ -21,17 +21,21 @@ lspci | grep S60 ## 运行环境准备 -推荐使用飞桨官方发布的燧原 GCU 开发镜像,该镜像预装有[燧原基础软件开发平台(TopsRider)](https://www.enflame-tech.com/developer)。 +您可以基于 docker、pip、源码等不同方式准备飞桨开发环境 + +### 基于 Docker 的方式(推荐) + +我们推荐使用飞桨官方发布的燧原 GCU 开发镜像,该镜像预装有[燧原基础软件开发平台(TopsRider)](https://www.enflame-tech.com/developer)和飞桨 3.0 版本的 SDK。 ```bash # 拉取镜像 -docker pull ccr-2vdh3abv-pub.cnc.bj.baidubce.com/device/paddle-gcu:topsrider3.2.109-ubuntu20-x86_64-gcc84 +docker pull ccr-2vdh3abv-pub.cnc.bj.baidubce.com/device/paddle-gcu:3.0.0-topsrider3.2.109-ubuntu20-x86_64-gcc84-py310 ``` ```bash # 参考如下命令启动容器 docker run --name paddle-gcu-dev -v /home:/home \ --network=host --ipc=host -it --privileged \ - ccr-2vdh3abv-pub.cnc.bj.baidubce.com/device/paddle-gcu:topsrider3.2.109-ubuntu20-x86_64-gcc84 /bin/bash + ccr-2vdh3abv-pub.cnc.bj.baidubce.com/device/paddle-gcu:3.0.0-topsrider3.2.109-ubuntu20-x86_64-gcc84-py310 /bin/bash ``` #### 选项说明及可调整参数 @@ -70,44 +74,36 @@ efsmi +--------------------------------------------------------------------------+ ``` -## 安装飞桨框架 - -### 安装方式一:wheel 包安装 - -燧原支持插件式安装,需先安装飞桨 CPU 安装包,再安装飞桨 GCU 插件包。在启动的 docker 容器中,执行以下命令: +### 基于 pip 安装的方式 ```bash -# 先安装飞桨 CPU 安装包 -python -m pip install paddlepaddle -i https://www.paddlepaddle.org.cn/packages/nightly/cpu - -# 再安装飞桨 GCU 插件包 -python -m pip install paddle-custom-gcu -i https://www.paddlepaddle.org.cn/packages/nightly/gcu +# 下载并安装 wheel 包 +python -m pip install paddlepaddle==3.0.0 -i https://www.paddlepaddle.org.cn/packages/stable/cpu/ +python -m pip install paddle-custom-gcu==3.0.0 -i https://www.paddlepaddle.org.cn/packages/stable/gcu/ ``` -⚠️ 注意:nightly 版本为每日构建,可能存在不稳定性。如果需要更稳定的版本,建议使用 3.0-rc 版本。 -### 安装方式二:源代码编译安装 -在启动的 docker 容器中,先安装飞桨 CPU 安装包,再下载 PaddleCustomDevice 源码编译得到飞桨 GCU 插件包。 +### 基于源码编译的方式 ```bash # 下载 PaddleCustomDevice 源码 -git clone https://github.com/PaddlePaddle/PaddleCustomDevice +git clone https://github.com/PaddlePaddle/PaddleCustomDevice -b release/3.0.0 # 进入硬件后端(燧原 GCU)目录 cd PaddleCustomDevice/backends/gcu # 先安装飞桨 CPU 安装包 -python -m pip install paddlepaddle -i https://www.paddlepaddle.org.cn/packages/nightly/cpu +python -m pip install paddlepaddle==3.0.0 -i https://www.paddlepaddle.org.cn/packages/stable/cpu/ -# 执行编译命令 - submodule 在编译时会按需下载 +# 执行编译脚本 - submodule 在编译时会按需下载 mkdir -p build && cd build export PADDLE_CUSTOM_PATH=`python -c "import re, paddle; print(re.compile('/__init__.py.*').sub('',paddle.__file__))"` cmake .. -DWITH_TESTING=ON -DCMAKE_EXPORT_COMPILE_COMMANDS=ON -DPY_VERSION=3.10 make -j $(nproc) -# 飞桨 GCU 插件包在 build/dist 路径下,使用 pip 安装即可 -python -m pip install --force-reinstall -U build/dist/paddle_custom_gcu*.whl +# 飞桨 MLU 插件包在 build/dist 路径下,使用 pip 安装即可 +python -m pip install build/dist/paddle_custom_gcu*.whl ``` -⚠️ 注意:nightly 版本为每日构建,可能存在不稳定性。如果需要更稳定的版本,建议使用 3.0-rc 版本。 + ## 基础功能检查 安装完成后,在 docker 容器中输入如下命令进行飞桨基础健康功能的检查。 @@ -118,26 +114,8 @@ python -c "import paddle_custom_device; paddle_custom_device.gcu.version()" ``` ```bash # 预期得到如下输出结果 -version: 3.0.0.dev20241206 -commit: 7a2766768cc92aa94cc3d0ea6c23e8397f15f68a +version: 3.0.0 +commit: e6e31bd475e38c18d2c39d58fad903bd16b3ca0d TopsPlatform: 1.2.0.301 .... ``` -```bash -# 飞桨基础健康检查 -python -c "import paddle; paddle.utils.run_check()" -``` -```bash -# 预期得到输出如下 -Running verify PaddlePaddle program ... -PaddlePaddle works well on 1 gcu. -PaddlePaddle is installed successfully! Let's start deep learning with PaddlePaddle now. -``` - -## 如何卸载 - -请使用以下命令卸载 Paddle: - -```bash -python -m pip uninstall paddlepaddle paddle-custom-gcu -``` diff --git a/docs/hardware_support/gcu/paddlex_tutorial_cn.md b/docs/hardware_support/gcu/paddlex_tutorial_cn.md index 3152b543fd6..603a2c464b0 100644 --- a/docs/hardware_support/gcu/paddlex_tutorial_cn.md +++ b/docs/hardware_support/gcu/paddlex_tutorial_cn.md @@ -8,7 +8,7 @@ * 考虑到环境差异性,我们推荐使用教程提供的标准镜像完成环境准备: - * 镜像链接:ccr-2vdh3abv-pub.cnc.bj.baidubce.com/device/paddle-gcu:topsrider3.2.109-ubuntu20-x86_64-gcc84 + * 镜像链接:ccr-2vdh3abv-pub.cnc.bj.baidubce.com/device/paddle-gcu:3.0.0-topsrider3.2.109-ubuntu20-x86_64-gcc84-py310 * 镜像中已经默认安装了燧原软件栈 TopsRider-3.2.109 @@ -16,23 +16,9 @@ ### 环境安装 -1. 安装 PaddlePaddle +1. 镜像中默认装有 3.0 版本的 PaddlePaddle,无需额外安装 -*该命令会自动安装飞桨主框架每日自动构建的 nightly-build 版本* - -```shell -python -m pip install paddlepaddle -i https://www.paddlepaddle.org.cn/packages/nightly/cpu/ -``` - -2. 安装 CustomDevice - -*该命令会自动安装飞桨 Custom Device 每日自动构建的 nightly-build 版本* - -```shell -python -m pip install paddle-custom-gcu -i https://www.paddlepaddle.org.cn/packages/nightly/gcu/ -``` - -3. 安装 PaddleX 代码库 +2. 安装 PaddleX 代码库 ```shell git clone https://github.com/PaddlePaddle/PaddleX.git @@ -46,7 +32,7 @@ cd PaddleX # -e:以可编辑模式安装,当前项目的代码更改,都会直接作用到已经安装的 PaddleX Wheel pip install -e . ``` -⚠️ 注意:nightly 版本为每日构建,可能存在不稳定性。如果需要更稳定的版本,建议使用 3.0-rc 版本。 + ## 基于 PaddleX 进行 ResNet50 推理 ### 一、安装 PaddleX 依赖 diff --git a/docs/hardware_support/hardware_info_cn.md b/docs/hardware_support/hardware_info_cn.md index ab4bb6b67cf..412df5d33f1 100644 --- a/docs/hardware_support/hardware_info_cn.md +++ b/docs/hardware_support/hardware_info_cn.md @@ -17,6 +17,7 @@ | AI 加速芯片 | | 壁仞 | BR100、BR104 | | [源码编译](https://github.com/PaddlePaddle/PaddleCustomDevice/blob/develop/backends/biren_gpu/README_cn.md) |[代码仓库](https://github.com/PaddlePaddle/PaddleCustomDevice) | | AI 加速芯片 | | 燧原 | 云燧 T20 、i20、S60 | | [源码编译](https://github.com/PaddlePaddle/PaddleCustomDevice/blob/develop/backends/gcu/README_cn.md) |[代码仓库](https://github.com/PaddlePaddle/PaddleCustomDevice) | | AI 加速芯片 | | 太初 | 元碁系列 | | [源码编译](https://github.com/PaddlePaddle/PaddleTecoBackend) |[代码仓库](https://github.com/PaddlePaddle/PaddleCustomDevice) | +| AI 加速芯片 | | 沐曦 | 曦云C系列 | | |[代码仓库](https://github.com/PaddlePaddle/PaddleCustomDevice) | ## Paddle Inference diff --git a/docs/hardware_support/mlu/install_cn.md b/docs/hardware_support/mlu/install_cn.md index e656681bb9c..28cbb8ccf0e 100644 --- a/docs/hardware_support/mlu/install_cn.md +++ b/docs/hardware_support/mlu/install_cn.md @@ -21,11 +21,15 @@ lspci -vvt | grep 370 ## 运行环境准备 -推荐使用飞桨官方发布的寒武纪 MLU 开发镜像,该镜像预装有[寒武纪基础软件开发平台](https://developer.cambricon.com/)。 +您可以基于 docker、pip、源码等不同方式准备飞桨开发环境 + +### 基于 Docker 的方式(推荐) + +我们推荐使用飞桨官方发布的寒武纪 MLU 开发镜像,该镜像预装有[寒武纪基础软件开发平台](https://developer.cambricon.com/)和飞桨 3.0 版本的 SDK。 ```bash # 拉取镜像 -docker pull ccr-2vdh3abv-pub.cnc.bj.baidubce.com/device/paddle-mlu:ctr2.15.0-ubuntu20-gcc84-py310 +docker pull ccr-2vdh3abv-pub.cnc.bj.baidubce.com/device/paddle-mlu:3.0.0-ctr2.15.0-ubuntu20-gcc84-py310 ``` ```bash # 参考如下命令,启动容器 @@ -33,7 +37,7 @@ docker run -it --name paddle-mlu-dev -v $(pwd):/work \ -w=/work --shm-size=128G --network=host --privileged \ --cap-add=SYS_PTRACE --security-opt seccomp=unconfined \ -v /usr/bin/cnmon:/usr/bin/cnmon \ - ccr-2vdh3abv-pub.cnc.bj.baidubce.com/device/paddle-mlu:ctr2.15.0-ubuntu20-gcc84-py310 /bin/bash + ccr-2vdh3abv-pub.cnc.bj.baidubce.com/device/paddle-mlu:3.0.0-ctr2.15.0-ubuntu20-gcc84-py310 /bin/bash ``` #### 选项说明及可调整参数 @@ -79,46 +83,36 @@ cnmon +------------------------------------------------------------------------------+ ``` -## 安装飞桨框架 - -**注意**:当前飞桨 develop 分支仅支持 X86 架构,暂不支持寒武纪 MLU 的 ARM 架构。 - -### 安装方式一:wheel 包安装 - -寒武纪支持插件式安装,需先安装飞桨 CPU 安装包,再安装飞桨 MLU 插件包。在启动的 docker 容器中,执行以下命令: +### 基于 pip 安装的方式 ```bash -# 先安装飞桨 CPU 安装包 -pip install paddlepaddle -i https://www.paddlepaddle.org.cn/packages/nightly/cpu - -# 再安装飞桨 MLU 插件包 -pip install paddle-custom-mlu -i https://www.paddlepaddle.org.cn/packages/nightly/mlu +# 下载并安装 wheel 包 +python -m pip install paddlepaddle==3.0.0 -i https://www.paddlepaddle.org.cn/packages/stable/cpu/ +python -m pip install paddle-custom-mlu==3.0.0 -i https://www.paddlepaddle.org.cn/packages/stable/mlu/ ``` -⚠️ 注意:nightly 版本为每日构建,可能存在不稳定性。如果需要更稳定的版本,建议使用 3.0-rc 版本。 -### 安装方式二:源代码编译安装 -在启动的 docker 容器中,先安装飞桨 CPU 安装包,再下载 PaddleCustomDevice 源码编译得到飞桨 MLU 插件包。 +### 基于源码编译的方式 ```bash # 下载 PaddleCustomDevice 源码 -git clone https://github.com/PaddlePaddle/PaddleCustomDevice +git clone https://github.com/PaddlePaddle/PaddleCustomDevice -b release/3.0.0 # 进入硬件后端(寒武纪 MLU)目录 cd PaddleCustomDevice/backends/mlu # 先安装飞桨 CPU 安装包 -pip install paddlepaddle -i https://www.paddlepaddle.org.cn/packages/nightly/cpu +python -m pip install paddlepaddle==3.0.0 -i https://www.paddlepaddle.org.cn/packages/stable/cpu/ # 执行编译脚本 - submodule 在编译时会按需下载 bash tools/compile.sh # 飞桨 MLU 插件包在 build/dist 路径下,使用 pip 安装即可 -pip install build/dist/paddle_custom_mlu*.whl +python -m pip install build/dist/paddle_custom_mlu*.whl ``` -⚠️ 注意:nightly 版本为每日构建,可能存在不稳定性。如果需要更稳定的版本,建议使用 3.0-rc 版本。 + ## 基础功能检查 -安装完成后,在 docker 容器中输入如下命令进行飞桨基础健康功能的检查。 +输入如下命令进行飞桨基础健康功能的检查。 ```bash # 检查当前安装版本 @@ -126,13 +120,13 @@ python -c "import paddle_custom_device; paddle_custom_device.mlu.version()" ``` ```bash # 预期得到如下输出结果 -version: 0.0.0 -commit: 147d506b2baa1971ab47b4550f0571e1f6b201fc -cntoolkit: 3.8.2 -cnnl: 1.23.2 -cnnl_extra: 1.6.1 -cncl: 1.14.0 -mluops: 0.11.0 +version: 3.0.0 +commit: e6e31bd475e38c18d2c39d58fad903bd16b3ca0d +cntoolkit: 3.10.1 +cnnl: 1.25.1 +cnnl_extra: 1.8.1 +cncl: 1.16.0 +mluops: 1.1.1 ``` ```bash # 飞桨基础健康检查 @@ -145,11 +139,3 @@ PaddlePaddle works well on 1 mlu. PaddlePaddle works well on 8 mlus. PaddlePaddle is installed successfully! Let's start deep learning with PaddlePaddle now. ``` - -## 如何卸载 - -请使用以下命令卸载: - -```bash -pip uninstall paddlepaddle paddle-custom-mlu -``` diff --git a/docs/hardware_support/mlu/paddle_tutorial_cn.md b/docs/hardware_support/mlu/paddle_tutorial_cn.md index fd4fe154461..0d79a7f7427 100644 --- a/docs/hardware_support/mlu/paddle_tutorial_cn.md +++ b/docs/hardware_support/mlu/paddle_tutorial_cn.md @@ -8,26 +8,10 @@ * 考虑到环境差异性,我们推荐使用教程提供的标准镜像完成环境准备: - * 镜像链接: ccr-2vdh3abv-pub.cnc.bj.baidubce.com/device/paddle-mlu:ctr2.15.0-ubuntu20-gcc84-py310 + * 镜像链接: ccr-2vdh3abv-pub.cnc.bj.baidubce.com/device/paddle-mlu:3.0.0-ctr2.15.0-ubuntu20-gcc84-py310 -### 环境安装 +* 镜像中默认装有 3.0 版本的 PaddlePaddle -1. 安装 PaddlePaddle - -*该命令会自动安装飞桨主框架每日自动构建的 nightly-build 版本* - -```shell -python -m pip install paddlepaddle -i https://www.paddlepaddle.org.cn/packages/nightly/cpu/ -``` - -2. 安装 CustomDevice - -*该命令会自动安装飞桨 Custom Device 每日自动构建的 nightly-build 版本* - -```shell -python -m pip install --pre paddle-custom-mlu -i https://www.paddlepaddle.org.cn/packages/nightly/mlu/ -``` -⚠️ 注意:nightly 版本为每日构建,可能存在不稳定性。如果需要更稳定的版本,建议使用 3.0-rc 版本。 ## 二、运行示例 飞桨框架集成了经典的视觉模型用于帮助用户快速上手,我们将基于 ResNet50 结构,在 Cifar10 数据集上进行一次快速训练,用于帮助您了解如何基于寒武纪 MLU 进行训练(和 GPU 训练代码相比,差异点仅为 `paddle.set_device("mlu")`) diff --git a/docs/hardware_support/mlu/paddlex_tutorial_cn.md b/docs/hardware_support/mlu/paddlex_tutorial_cn.md index ae77d070e62..7612e7487ed 100644 --- a/docs/hardware_support/mlu/paddlex_tutorial_cn.md +++ b/docs/hardware_support/mlu/paddlex_tutorial_cn.md @@ -8,17 +8,11 @@ * 考虑到环境差异性,我们推荐使用教程提供的标准镜像完成环境准备: - * 镜像链接: ccr-2vdh3abv-pub.cnc.bj.baidubce.com/device/paddle-mlu:ctr2.15.0-ubuntu20-gcc84-py310 + * 镜像链接: ccr-2vdh3abv-pub.cnc.bj.baidubce.com/device/paddle-mlu:3.0.0-ctr2.15.0-ubuntu20-gcc84-py310 ### 环境安装 -1. 安装 PaddlePaddle - -*该命令会自动安装飞桨主框架每日自动构建的 nightly-build 版本* - -```shell -python -m pip install paddlepaddle -i https://www.paddlepaddle.org.cn/packages/nightly/cpu/ -``` +1. 镜像中默认装有 3.0 版本的 PaddlePaddle,无需额外安装 2. 安装 CustomDevice @@ -42,7 +36,7 @@ cd PaddleX # -e:以可编辑模式安装,当前项目的代码更改,都会直接作用到已经安装的 PaddleX Wheel pip install -e . ``` -⚠️ 注意:nightly 版本为每日构建,可能存在不稳定性。如果需要更稳定的版本,建议使用 3.0-rc 版本。 + ## 基于 PaddleX 训练 ResNet50 ### 一、安装 PaddleX 依赖 diff --git a/docs/hardware_support/npu/install_cn.md b/docs/hardware_support/npu/install_cn.md index a944fd99afd..84066bab01d 100644 --- a/docs/hardware_support/npu/install_cn.md +++ b/docs/hardware_support/npu/install_cn.md @@ -24,12 +24,16 @@ lspci | grep d802 ## 运行环境准备 -推荐使用飞桨官方发布的昇腾 NPU 开发镜像,该镜像预装有[昇腾基础软件开发平台(CANN)](https://www.hiascend.com/software/cann)。 +您可以基于 docker、pip、源码等不同方式准备飞桨开发环境 + +### 基于 Docker 的方式(推荐) + +我们推荐使用飞桨官方发布的昇腾 NPU 开发镜像,该镜像预装有[昇腾基础软件开发平台(CANN)](https://www.hiascend.com/software/cann)和飞桨 3.0 版本的 SDK。 ```bash # 拉取镜像 -docker pull ccr-2vdh3abv-pub.cnc.bj.baidubce.com/device/paddle-npu:cann80RC2-ubuntu20-npu-base-x86_64-gcc84 # X86 架构 -docker pull ccr-2vdh3abv-pub.cnc.bj.baidubce.com/device/paddle-npu:cann80RC2-ubuntu20-npu-base-aarch64-gcc84 # ARM 架构 +docker pull ccr-2vdh3abv-pub.cnc.bj.baidubce.com/device/paddle-npu:3.0.0-cann800-ubuntu20-npu-910b-x86_64-gcc84-py310 # X86 架构 +docker pull ccr-2vdh3abv-pub.cnc.bj.baidubce.com/device/paddle-npu:3.0.0-cann800-ubuntu20-npu-910b-aarch64-gcc84-py310 # ARM 架构 ``` ```bash # 考如下命令启动容器,ASCEND_RT_VISIBLE_DEVICES 可指定可见的 NPU 卡号 @@ -39,7 +43,7 @@ docker run -it --name paddle-npu-dev -v $(pwd):/work \ -v /usr/local/bin/npu-smi:/usr/local/bin/npu-smi \ -v /usr/local/dcmi:/usr/local/dcmi \ -e ASCEND_RT_VISIBLE_DEVICES="0,1,2,3,4,5,6,7" \ - ccr-2vdh3abv-pub.cnc.bj.baidubce.com/device/paddle-npu:cann80RC2-ubuntu20-$(uname -m)-gcc84 /bin/bash + ccr-2vdh3abv-pub.cnc.bj.baidubce.com/device/paddle-npu:3.0.0-cann800-ubuntu20-npu-910b-$(uname -m)-gcc84-py310 /bin/bash ``` #### 选项说明及可调整参数 @@ -86,44 +90,38 @@ npu-smi info +===========================+===============+====================================================+ ``` -## 安装飞桨框架 - -### 安装方式一:wheel 包安装 - -昇腾支持插件式安装,需先安装飞桨 CPU 安装包,再安装飞桨 NPU 插件包。在启动的 docker 容器中,执行以下命令: +### 基于 pip 安装的方式 ```bash # 先安装飞桨 CPU 安装包 -pip install paddlepaddle -i https://www.paddlepaddle.org.cn/packages/nightly/cpu +python -m pip install paddlepaddle==3.0.0 -i https://www.paddlepaddle.org.cn/packages/stable/cpu/ # 再安装飞桨 NPU 插件包 -pip install paddle-custom-npu -i https://www.paddlepaddle.org.cn/packages/nightly/npu +python -m pip install paddle-custom-npu==3.0.0 -i https://www.paddlepaddle.org.cn/packages/stable/npu/ ``` -⚠️ 注意:nightly 版本为每日构建,可能存在不稳定性。如果需要更稳定的版本,建议使用 3.0-rc 版本。 -### 安装方式二:源代码编译安装 -在启动的 docker 容器中,先安装飞桨 CPU 安装包,再下载 PaddleCustomDevice 源码编译得到飞桨 NPU 插件包。 +### 基于源码编译的方式 ```bash # 下载 PaddleCustomDevice 源码 -git clone https://github.com/PaddlePaddle/PaddleCustomDevice +git clone https://github.com/PaddlePaddle/PaddleCustomDevice -b release/3.0.0 # 进入硬件后端(昇腾 NPU)目录 cd PaddleCustomDevice/backends/npu # 先安装飞桨 CPU 安装包 -pip install paddlepaddle -i https://www.paddlepaddle.org.cn/packages/nightly/cpu +python -m pip install paddlepaddle==3.0.0 -i https://www.paddlepaddle.org.cn/packages/stable/cpu/ # 执行编译脚本 - submodule 在编译时会按需下载 bash tools/compile.sh # 飞桨 NPU 插件包在 build/dist 路径下,使用 pip 安装即可 -pip install build/dist/paddle_custom_npu*.whl +python -m pip install build/dist/paddle_custom_npu*.whl ``` -⚠️ 注意:nightly 版本为每日构建,可能存在不稳定性。如果需要更稳定的版本,建议使用 3.0-rc 版本。 + ## 基础功能检查 -安装完成后,在 docker 容器中输入如下命令进行飞桨基础健康功能的检查。 +输入如下命令进行飞桨基础健康功能的检查。 ```bash # 检查当前安装版本 @@ -131,10 +129,10 @@ python -c "import paddle_custom_device; paddle_custom_device.npu.version()" ``` ```bash # 预期得到如下输出结果 -version: 0.0.0 -commit: 147d506b2baa1971ab47b4550f0571e1f6b201fc -cann: 8.0.RC2 -.... +version: 3.0.0 +commit: e6e31bd475e38c18d2c39d58fad903bd16b3ca0d +custom_op commit: e6e31bd475e38c18d2c39d58fad903bd16b3ca0d +cann: 8.0.0 ``` ```bash # 飞桨基础健康检查 @@ -148,17 +146,9 @@ PaddlePaddle works well on 8 npus. PaddlePaddle is installed successfully! Let's start deep learning with PaddlePaddle now. ``` -## 如何卸载 - -请使用以下命令卸载 Paddle: - -```bash -pip uninstall paddlepaddle paddle-custom-npu -``` - ## 常见问题解决 -* CANN-8.0.RC2 对 numpy 和 opencv 部分版本不支持,建议安装指定版本 +* CANN-8.0.x 系列 对 numpy 和 opencv 部分版本不支持,建议安装指定版本 ```bash python -m pip install numpy==1.26.4 python -m pip install opencv-python==3.4.18.65 diff --git a/docs/hardware_support/npu/paddle_tutorial_cn.md b/docs/hardware_support/npu/paddle_tutorial_cn.md index 4441631fdca..f5d84fbc0a0 100644 --- a/docs/hardware_support/npu/paddle_tutorial_cn.md +++ b/docs/hardware_support/npu/paddle_tutorial_cn.md @@ -8,32 +8,16 @@ * 考虑到环境差异性,我们推荐使用教程提供的标准镜像完成环境准备: - * x86_64 镜像链接:ccr-2vdh3abv-pub.cnc.bj.baidubce.com/device/paddle-npu:cann80RC2-ubuntu20-npu-base-x86_64-gcc84 + * x86_64 镜像链接:ccr-2vdh3abv-pub.cnc.bj.baidubce.com/device/paddle-npu:3.0.0-cann800-ubuntu20-npu-910b-x86_64-gcc84-py310 - * aarch64 镜像链接:ccr-2vdh3abv-pub.cnc.bj.baidubce.com/device/paddle-npu:cann80RC2-ubuntu20-npu-base-aarch64-gcc84 + * aarch64 镜像链接:ccr-2vdh3abv-pub.cnc.bj.baidubce.com/device/paddle-npu:3.0.0-cann800-ubuntu20-npu-910b-aarch64-gcc84-py310 - * 镜像中已经默认安装了昇腾算子库 CANN-8.0.RC2 + * 镜像中已经默认安装了昇腾算子库 CANN-8.0.0 * 昇腾驱动版本为 23.0.3 -### 环境安装 +* 镜像中默认装有 3.0 版本的 PaddlePaddle -1. 安装 PaddlePaddle - -*该命令会自动安装飞桨主框架每日自动构建的 nightly-build 版本* - -```shell -python -m pip install paddlepaddle -i https://www.paddlepaddle.org.cn/packages/nightly/cpu/ -``` - -2. 安装 CustomDevice - -*该命令会自动安装飞桨 Custom Device 每日自动构建的 nightly-build 版本* - -```shell -python -m pip install paddle-custom-npu -i https://www.paddlepaddle.org.cn/packages/nightly/npu/ -``` -⚠️ 注意:nightly 版本为每日构建,可能存在不稳定性。如果需要更稳定的版本,建议使用 3.0-rc 版本。 ## 二、运行示例 飞桨框架集成了经典的视觉模型用于帮助用户快速上手,我们将基于 ResNet50 结构,在 Cifar10 数据集上进行一次快速训练,用于帮助您了解如何基于昇腾 NPU 进行训练(和 GPU 训练代码相比,差异点仅为 `paddle.set_device("npu")`) diff --git a/docs/hardware_support/npu/paddlex_tutorial_cn.md b/docs/hardware_support/npu/paddlex_tutorial_cn.md index 0c1cbd5804e..18a8230416d 100644 --- a/docs/hardware_support/npu/paddlex_tutorial_cn.md +++ b/docs/hardware_support/npu/paddlex_tutorial_cn.md @@ -8,23 +8,17 @@ * 考虑到环境差异性,我们推荐使用教程提供的标准镜像完成环境准备: - * x86_64 镜像链接:ccr-2vdh3abv-pub.cnc.bj.baidubce.com/device/paddle-npu:cann80RC2-ubuntu20-npu-base-x86_64-gcc84 + * x86_64 镜像链接:ccr-2vdh3abv-pub.cnc.bj.baidubce.com/device/paddle-npu:3.0.0-cann800-ubuntu20-npu-910b-x86_64-gcc84-py310 - * aarch64 镜像链接:ccr-2vdh3abv-pub.cnc.bj.baidubce.com/device/paddle-npu:cann80RC2-ubuntu20-npu-base-aarch64-gcc84 + * aarch64 镜像链接:ccr-2vdh3abv-pub.cnc.bj.baidubce.com/device/paddle-npu:3.0.0-cann800-ubuntu20-npu-910b-aarch64-gcc84-py310 - * 镜像中已经默认安装了昇腾算子库 CANN-8.0.RC2 + * 镜像中已经默认安装了昇腾算子库 CANN-8.0.0 * 昇腾驱动版本为 23.0.3 ### 环境安装 -1. 安装 PaddlePaddle - -*该命令会自动安装飞桨主框架每日自动构建的 nightly-build 版本* - -```shell -python -m pip install paddlepaddle -i https://www.paddlepaddle.org.cn/packages/nightly/cpu/ -``` +1. 镜像中默认装有 3.0 版本的 PaddlePaddle,无需额外安装 2. 安装 CustomDevice @@ -48,7 +42,7 @@ cd PaddleX # -e:以可编辑模式安装,当前项目的代码更改,都会直接作用到已经安装的 PaddleX Wheel pip install -e . ``` -⚠️ 注意:nightly 版本为每日构建,可能存在不稳定性。如果需要更稳定的版本,建议使用 3.0-rc 版本。 + ## 基于 PaddleX 训练 ResNet50 ### 一、安装 PaddleX 依赖 diff --git a/docs/hardware_support/xpu/index_cn.rst b/docs/hardware_support/xpu/index_cn.rst index e81bdc05a6c..607f5f935db 100644 --- a/docs/hardware_support/xpu/index_cn.rst +++ b/docs/hardware_support/xpu/index_cn.rst @@ -10,24 +10,16 @@ 更多昆仑芯 XPU 芯片详情及技术指标请 `点击这里 `_ 。 -飞桨框架支持基于昆仑芯 XPU 芯片的训练和推理,请参考以下内容快速体验: +飞桨框架支持基于昆仑芯 XPU 芯片的训练和推理,请参考以下内容快速体验(注意,从 3.0 版本开始,飞桨不再支持昆仑 2 代芯片,如果需要使用相关功能,请使用 3.0rc 版本): -- `昆仑芯 XPU 安装说明 <./xpu-gen2_install_cn.html>`_: 昆仑芯 XPU 二代芯片安装说明 -- `昆仑芯 XPU 基于框架的使用指南 <./xpu-gen2_paddle_tutorial_cn.html>`_ : 昆仑芯 XPU 二代芯片基于框架的使用指南 -- `昆仑芯 XPU 基于套件的使用指南 <./xpu-gen2_paddlex_tutorial_cn.html>`_ : 昆仑芯 XPU 二代芯片基于套件的使用指南 -- `昆仑芯 XPU 支持模型 <./xpu-gen2_support_cn.html>`_ : 昆仑芯 XPU 二代芯片支持模型 -- `昆仑芯 XPU 安装说明 <./xpu-p800_install_cn.html>`_: 昆仑芯 XPU P800 安装说明 -- `昆仑芯 XPU 基于框架的使用指南 <./xpu-p800_paddle_tutorial_cn.html>`_ : 昆仑芯 XPU P800 基于框架的使用指南 -- `昆仑芯 XPU 基于套件的使用指南 <./xpu-p800_paddlex_tutorial_cn.html>`_ : 昆仑芯 XPU P800 基于套件的使用指南 -- `昆仑芯 XPU 支持模型 <./xpu-p800_support_cn.html>`_ : 昆仑芯 XPU P800 支持模型 +- `昆仑芯 XPU P800 安装说明 <./xpu-p800_install_cn.html>`_ : 昆仑芯 XPU P800 安装说明 +- `昆仑芯 XPU P800 基于框架的使用指南 <./xpu-p800_paddle_tutorial_cn.html>`_ : 昆仑芯 XPU P800 基于框架的使用指南 +- `昆仑芯 XPU P800 基于套件的使用指南 <./xpu-p800_paddlex_tutorial_cn.html>`_ : 昆仑芯 XPU P800 基于套件的使用指南 +- `昆仑芯 XPU P800 支持模型 <./xpu-p800_support_cn.html>`_ : 昆仑芯 XPU P800 支持模型 .. toctree:: :hidden: - xpu-gen2_install_cn.md - xpu-gen2_paddle_tutorial_cn.md - xpu-gen2_paddlex_tutorial_cn.md - xpu-gen2_support_cn.md xpu-p800_install_cn.md xpu-p800_paddle_tutorial_cn.md xpu-p800_paddlex_tutorial_cn.md diff --git a/docs/hardware_support/xpu/xpu-gen2_install_cn.md b/docs/hardware_support/xpu/xpu-gen2_install_cn.md deleted file mode 100644 index e9406376acd..00000000000 --- a/docs/hardware_support/xpu/xpu-gen2_install_cn.md +++ /dev/null @@ -1,154 +0,0 @@ -# 昆仑芯 XPU 安装说明 - -飞桨框架 XPU 版支持昆仑芯 XPU 的训练和推理,提供两种安装方式: - -1. 通过飞桨官网发布的 wheel 包安装 -2. 通过源代码编译安装得到 wheel 包 - -## 昆仑芯 XPU 系统要求 - -| 要求类型 | 要求内容 | -| --------- | -------- | -| 芯片型号 | 昆仑芯 2 代,包括 R200、R300、R200-8F、RG800 | -| 操作系统 | Linux 操作系统,包括 Ubuntu、CentOS、KylinV10 | - -**注意**:当前教程适用于『昆仑芯』二代芯片。查看芯片类型请参考如下命令: - -```bash -# 系统环境下运行如下命令,如果有设备列表输出,且字段为 3684,则说明芯片为昆仑芯二代芯片 -lspci -d 1d22: -n -``` - -## 运行环境准备 - -推荐使用飞桨官方发布的昆仑芯 XPU 开发镜像,该镜像预装有昆仑芯基础运行环境库(XRE)。 - -```bash -# 拉取镜像 -docker pull ccr-2vdh3abv-pub.cnc.bj.baidubce.com/device/paddle-xpu:ubuntu20-x86_64-gcc84-py310 -``` -```bash -# 参考如下命令,启动容器 -docker run -it --name paddle-xpu-dev -v $(pwd):/work \ - -w=/work --shm-size=128G --network=host --privileged \ - --cap-add=SYS_PTRACE --security-opt seccomp=unconfined \ - ccr-2vdh3abv-pub.cnc.bj.baidubce.com/device/paddle-xpu:ubuntu20-x86_64-gcc84-py310 /bin/bash -``` -#### 选项说明及可调整参数 - -##### ① `--name paddle-xpu-dev` -- **作用**:指定容器名称。 -- **可调整**: - - 用户可改为其他名称,例如 `paddle-xpu-test`,方便区分不同实验。 - -##### ② `-v $(pwd):/work` -- **作用**:挂载本地目录到容器内 `/work` 目录。 -- **可调整**: - - 可以修改 `$(pwd)` 为实际路径,例如 `-v /data/projects:/work`,让容器访问宿主机的数据。 - -##### ③ `--shm-size=128G` -- **作用**:设置共享内存大小,影响数据处理和计算效率。 -- **可调整**: - - 若内存有限,可降低,如 `--shm-size=32G`,但可能影响大规模训练。 - - 若训练任务需要更大共享内存,可提高,如 `--shm-size=256G`。 -```bash -# 检查容器内是否可以正常识别昆仑芯 XPU 设备 -xpu_smi -``` -```bash -# 预期得到输出如下 -Runtime Version: 4.31 -Driver Version: 4.0 - DEVICES -------------------------------------------------------------------------------------------- -| DevID | PCI Addr | Model | SN | INODE | UseRate | L3 | Memory | -------------------------------------------------------------------------------------------- -| 0 | 0000:53:00.0 | R300 | 02Kxxx | /dev/xpu0 | 0 % | 0 / 63 MB | 0 / 32768 MB | -| 1 | 0000:56:00.0 | R300 | 02Kxxx | /dev/xpu1 | 0 % | 0 / 63 MB | 0 / 32768 MB | -------------------------------------------------------------------------------------------- - VIDEO ------------------------------------------------------------------------------------ -| DevID | Model | DEC | ENC | IMGPROC | ------------------------------------------------------------------------------------ -| 0 | R300 | 0 %, 0 fps, 800 MHz | 0 %, 0 fps, 800 MHz | 0 %, 0 fps, 800 MHz | -| 1 | R300 | 0 %, 0 fps, 800 MHz | 0 %, 0 fps, 800 MHz | 0 %, 0 fps, 800 MHz | ------------------------------------------------------------------------------------ - PROCESSES -------------------------------------------------- -| DevID | PID | Streams | L3 | Memory | Command | -------------------------------------------------- -------------------------------------------------- -``` - -## 安装飞桨框架 - -**注意**:当前飞桨 develop 分支仅支持 X86 架构,如需昆仑芯 XPU 的 ARM 架构支持,请切换到 [release/2.6](https://www.paddlepaddle.org.cn/documentation/docs/zh/2.6/guides/hardware_support/xpu/install_cn.html) 分支。 - -### 安装方式一:wheel 包安装 - -在启动的 docker 容器中,下载并安装飞桨官网发布的 wheel 包。 - -```bash -# 下载并安装 wheel 包 -pip install paddlepaddle-xpu -i https://www.paddlepaddle.org.cn/packages/nightly/xpu -``` -⚠️ 注意:nightly 版本为每日构建,可能存在不稳定性。如果需要更稳定的版本,建议使用 3.0-rc 版本。 -### 安装方式二:源代码编译安装 - -在启动的 docker 容器中,下载 Paddle 源码并编译,CMAKE 编译选项含义请参见[编译选项表](https://www.paddlepaddle.org.cn/documentation/docs/zh/develop/install/Tables.html#Compile)。 - -```bash -# 下载 Paddle 源码 -git clone https://github.com/PaddlePaddle/Paddle.git -b develop -cd Paddle - -# 创建编译目录 -mkdir build && cd build - -# cmake 编译命令 -cmake .. -DCMAKE_EXPORT_COMPILE_COMMANDS=ON -DCMAKE_CXX_FLAGS="-Wno-error -w" \ - -DPY_VERSION=3.10 -DPYTHON_EXECUTABLE=`which python3` -DWITH_CUSTOM_DEVICE=OFF \ - -DWITH_TESTING=OFF -DON_INFER=ON -DWITH_DISTRIBUTE=ON -DWITH_ARM=OFF \ - -DWITH_XPU=ON -DWITH_XPU_BKCL=ON -DWITH_UBUNTU=ON - -# make 编译命令 -make -j16 - -# 编译产出在 build/python/dist/ 路径下,使用 pip 安装即可 -pip install -U paddlepaddle_xpu-0.0.0-cp310-cp310-linux_x86_64.whl -``` -⚠️ 注意:nightly 版本为每日构建,可能存在不稳定性。如果需要更稳定的版本,建议使用 3.0-rc 版本。 -## 基础功能检查 - -安装完成后,在 docker 容器中输入如下命令进行飞桨基础健康功能的检查。 - -```bash -# 检查当前安装版本 -python -c "import paddle; paddle.version.show()" -``` -```bash -# 预期得到输出如下 -commit: 84425362060e126b066a5a0f0d29ae2e2218a834 -xpu: 20240104 -xpu_xccl: 1.1.8.1 -xpu_xhpc: 20240312 -``` -```bash -# 飞桨基础健康检查 -python -c "import paddle; paddle.utils.run_check()" -``` -```bash -# 预期得到输出如下 -Running verify PaddlePaddle program ... -PaddlePaddle works well on 1 XPU. -PaddlePaddle works well on 8 XPUs. -PaddlePaddle is installed successfully! Let's start deep learning with PaddlePaddle now. -``` - -## 如何卸载 - -请使用以下命令卸载: - -```bash -pip uninstall paddlepaddle-xpu -``` diff --git a/docs/hardware_support/xpu/xpu-gen2_paddle_tutorial_cn.md b/docs/hardware_support/xpu/xpu-gen2_paddle_tutorial_cn.md deleted file mode 100644 index df6a9f17f8a..00000000000 --- a/docs/hardware_support/xpu/xpu-gen2_paddle_tutorial_cn.md +++ /dev/null @@ -1,110 +0,0 @@ -# 昆仑芯 XPU 基于框架的使用指南 - -## 一、环境准备 - -### 环境说明 - -* 本教程介绍如何基于昆仑芯 XPU 进行 ResNet50 的训练,总共需要 1 卡进行训练 - -* 考虑到环境差异性,我们推荐使用教程提供的标准镜像完成环境准备: - - * 镜像链接: ccr-2vdh3abv-pub.cnc.bj.baidubce.com/device/paddle-xpu:ubuntu20-x86_64-gcc84-py310 - -### 环境安装 - -安装 PaddlePaddle - -*该命令会自动安装飞桨主框架每日自动构建的 nightly-build 版本* - -*由于 xpu 代码位于飞桨主框架中,因此我们不需要安装额外的 Custom Device 包* - -```shell -python -m pip install paddlepaddle-xpu -i https://www.paddlepaddle.org.cn/packages/nightly/xpu/ -``` -⚠️ 注意:nightly 版本为每日构建,可能存在不稳定性。如果需要更稳定的版本,建议使用 3.0-rc 版本。 -## 二、运行示例 - -飞桨框架集成了经典的视觉模型用于帮助用户快速上手,我们将基于 ResNet50 结构,在 Cifar10 数据集上进行一次快速训练,用于帮助您了解如何基于昆仑芯 XPU 进行训练(和 GPU 训练代码相比,差异点仅为 `paddle.set_device("xpu")`) - -注意: - -* *本教程主要用于快速入门,并未对参数进行细致调优,训练效果未必是最好的,您可以自行调整超参数进行效果调优* - -* *本教程预计使用单卡 R300 训练 40 分钟* - -1. 导入必要的包 - -```python -import paddle -from paddle.vision import transforms -from paddle.vision.models import resnet50 -``` - -2. 设置运行设备 - -```python -# 1. 设定运行设备为 xpu -paddle.set_device("xpu") -``` - -3. 加载训练数据集 - -```python -# 2. 定义数据集、数据预处理方法与 DataLoader -transform = transforms.Compose([ - transforms.Resize(224), - transforms.ToTensor(), - transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5)) -]) -train_set = paddle.vision.datasets.Cifar10(mode='train', transform=transform) -train_loader = paddle.io.DataLoader(train_set, batch_size=128, num_workers=8) -``` - -4. 定义网络结构和损失函数 - -```python -# 3. 定义网络结构 -net = resnet50(num_classes=10) -# 4. 定义损失函数 -net_loss = paddle.nn.CrossEntropyLoss() -# 5. 定义优化器 -optimizer = paddle.optimizer.Adam(learning_rate=0.001, parameters=net.parameters()) -``` - -5. 启动训练 - -训练过程中会打印 loss 的变化情况,可以观察到 loss 在初步下降,这意味着模型参数逐渐适应了该数据集。 - -```python -net.train() -for epoch in range(10): - for batch_idx, data in enumerate(train_loader, start=0): - inputs, labels = data - optimizer.clear_grad() - # 6. 前向传播并计算损失 - outputs = net(inputs) - loss = net_loss(outputs, labels) - # 7. 反向传播 - loss.backward() - # 8. 更新参数 - optimizer.step() - print('Epoch %d, Iter %d, Loss: %.5f' % (epoch + 1, batch_idx + 1, loss)) -print('Finished Training') -``` - -6. 测试模型效果 - -```python -test_dataset = paddle.vision.datasets.Cifar10(mode='test', transform=transform) - -# 测试 5 张图片效果 -for i in range(5): - test_image, gt = test_dataset[0] - # CHW -> NCHW - test_image = test_image.unsqueeze(0) - - # 取预测分布中的最大值 - res = net(test_image).argmax().numpy() - print(f"图像{i} 标签:{gt}") - print(f"模型预测结果:{res}") -``` diff --git a/docs/hardware_support/xpu/xpu-gen2_paddlex_tutorial_cn.md b/docs/hardware_support/xpu/xpu-gen2_paddlex_tutorial_cn.md deleted file mode 100644 index 51084a9ebba..00000000000 --- a/docs/hardware_support/xpu/xpu-gen2_paddlex_tutorial_cn.md +++ /dev/null @@ -1,136 +0,0 @@ -# 昆仑芯 XPU 基于 PaddleX 的使用指南 - -## 环境准备 - -### 环境说明 - -* 本教程介绍如何基于昆仑芯 XPU 进行 ResNet50 的训练,总共需要 4 卡进行训练 - -* 考虑到环境差异性,我们推荐使用教程提供的标准镜像完成环境准备: - - * 镜像链接: ccr-2vdh3abv-pub.cnc.bj.baidubce.com/device/paddle-xpu:ubuntu20-x86_64-gcc84-py310 - -### 环境安装 - -1. 安装 PaddlePaddle - -*该命令会自动安装飞桨主框架每日自动构建的 nightly-build 版本* - -*由于 xpu 代码位于飞桨主框架中,因此我们不需要安装额外的 Custom Device 包* - -```shell -python -m pip install paddlepaddle-xpu -i https://www.paddlepaddle.org.cn/packages/nightly/xpu/ -``` - -2. 安装 PaddleX 代码库 - -```shell -git clone https://github.com/PaddlePaddle/PaddleX.git - -# 如果速度较慢,可以考虑从 gitee 拉取 -# git clone https://gitee.com/paddlepaddle/PaddleX.git - -cd PaddleX - -# 安装 PaddleX whl -# -e:以可编辑模式安装,当前项目的代码更改,都会直接作用到已经安装的 PaddleX Wheel -pip install -e . -``` -⚠️ 注意:nightly 版本为每日构建,可能存在不稳定性。如果需要更稳定的版本,建议使用 3.0-rc 版本。 -## 基于 PaddleX 训练 ResNet50 - -### 一、安装 PaddleX 依赖 - -```shell -# 跳转到 PaddleX 根目录下 -cd /path/to/paddlex - -# 安装 PaddleX 相关依赖,由于我们使用的是图像分类模型,因此安装图像分类库 -paddlex --install PaddleClas - -# 完成安装后会有如下提示: -# All packages are installed. -``` - -### 二、数据准备 - -为了快速上手验证,我们基于 flowers 102 数据集进行快速体验: - -1. 下载数据集 - -```shell -# 跳转到 PaddleX 根目录下 -cd /path/to/paddlex - -# 下载并解压数据 -wget https://paddle-model-ecology.bj.bcebos.com/paddlex/data/cls_flowers_examples.tar -P ./dataset -tar -xf ./dataset/cls_flowers_examples.tar -C ./dataset/ -``` - -2. 数据校验 - -```shell -# PaddleX 支持对数据集进行校验,确保数据集格式符合 PaddleX 的相关要求。同时在数据校验时,能够对数据集进行分析,统计数据集的基本信息。 -python main.py -c paddlex/configs/image_classification/ResNet50.yaml \ - -o Global.mode=check_dataset \ - -o Global.dataset_dir=./dataset/cls_flowers_examples - -# 命令运行成功后会在 log 中打印出 Check dataset passed ! 信息 -``` - -更多关于 PaddleX 数据集说明的内容,可以查看 [PaddleX 图像分类模块数据准备](https://github.com/PaddlePaddle/PaddleX/blob/release/3.0-beta1/docs/module_usage/tutorials/cv_modules/image_classification.md#41-%E6%95%B0%E6%8D%AE%E5%87%86%E5%A4%87) - -### 三、模型训练 - -进入 `PaddleX` 目录下,执行如下命令启动 4 卡 XPU(0 ~ 3 号卡)训练,其中: - -* 参数 `-o Global.device` 指定的是即将运行的设备,这里需要传入的是 `xpu:0,1,2,3` ,通过指定该参数,PaddleX 调用飞桨的设备指定接口 `paddle.set_device` 来指定运行设备为 `xpu` ,在进行模型训练时,飞桨将自动调用 xpu 算子用于执行模型计算。关于设备指定的更多细节,可以参考官方 api [paddle.set_device](https://www.paddlepaddle.org.cn/documentation/docs/zh/api/paddle/device/set_device_cn.html#set-device)。 - -* 参数 `-c paddlex/configs/modules/image_classification/ResNet50.yaml` 表示读取指定目录下的配置文件,配置文件中指定了模型结构,训练超参等所有训练模型需要用到的配置,该文件中指定的模型结构为 `ResNet50` - -```shell -python main.py -c paddlex/configs/modules/image_classification/ResNet50.yaml \ - -o Global.mode=train \ - -o Global.dataset_dir=./dataset/cls_flowers_examples \ - -o Global.output=resnet50_output \ - -o Global.device="xpu:0,1,2,3" -``` - -上述命令会在 `PaddleX` 目录下产生一个 `resnet50_output/` 目录,该目录会存放训练过程中的模型参数 - -### 四、模型推理 - -#### 基于 PaddleInference 推理 - -训练完成后,最优权重放在 `resnet50_output/best_model/` 目录下,其中 `inference/inference.pdiparams`、`inference/inference.pdiparams.info`、`inference/inference.pdmodel` 3 个文件为静态图文件,用于推理使用,使用如下命令进行推理 - -```shell -python main.py -c paddlex/configs/modules/image_classification/ResNet50.yaml \ - -o Global.mode=predict \ - -o Predict.model_dir="./resnet50_output/best_model/inference" \ - -o Predict.input="https://paddle-model-ecology.bj.bcebos.com/paddlex/imgs/demo_image/general_image_classification_001.jpg" \ - -o Global.device="xpu:0" -``` - -#### 转换 ONNX 模型 - -如果您有额外的部署需求需要基于 ONNX 实现,我们也提供了专用的工具用于导出 ONNX 模型,参考如下步骤,即可将第一步导出的静态图模型转换为 ONNX 模型: - -a. 安装环境 - -```shell -# 安装 paddle2onnx,该工具支持将 PaddleInference 模型转换为 ONNX 格式 -python -m pip install paddle2onnx -``` - -b. 模型转换 - -```shell -paddle2onnx --model_dir=./resnet50_output/best_model/inference \ - --model_filename=inference.pdmodel \ - --params_filename=inference.pdiparams \ - --save_file=./resnet50_output/best_model/inference.onnx \ - --enable_onnx_checker=True -``` - -该命令会在 `resnet50_output/best_model` 目录下生成 `inference.onnx` 文件 diff --git a/docs/hardware_support/xpu/xpu-gen2_support_cn.md b/docs/hardware_support/xpu/xpu-gen2_support_cn.md deleted file mode 100644 index 9500f652170..00000000000 --- a/docs/hardware_support/xpu/xpu-gen2_support_cn.md +++ /dev/null @@ -1,54 +0,0 @@ -# 昆仑芯 XPU 支持模型 - -飞桨框架在昆仑芯 XPU 上通过精度验证的模型情况如下: - -* PaddleX 使用文档详见:[PaddleX 多硬件使用](https://github.com/PaddlePaddle/PaddleX/blob/release/3.0-beta1/docs/other_devices_support/multi_devices_use_guide.md) -* PaddleNLP 大语言模型多硬件使用文档详见:[PaddleNLP XPU 大语言模型使用文档](https://github.com/PaddlePaddle/PaddleNLP/tree/develop/llm/xpu) -* 如果您适配/验证过更多模型,欢迎向飞桨开源社区贡献适配代码,然后邮件联系我们更新本列表 [ext_paddle_oss](ext_paddle_oss@baidu.com) - -| 模型库 | 模型类型 | 模型名称 | 训练 | 推理 | -| - | - | - | - | - | -| PaddleX | 图像分类 | [ResNet18](https://github.com/PaddlePaddle/PaddleX/blob/develop/paddlex/configs/modules/image_classification/ResNet18.yaml) | √ | √ | -| PaddleX | 图像分类 | [ResNet34](https://github.com/PaddlePaddle/PaddleX/blob/develop/paddlex/configs/modules/image_classification/ResNet34.yaml) | √ | √ | -| PaddleX | 图像分类 | [ResNet50](https://github.com/PaddlePaddle/PaddleX/blob/develop/paddlex/configs/modules/image_classification/ResNet50.yaml) | √ | √ | -| PaddleX | 图像分类 | [ResNet101](https://github.com/PaddlePaddle/PaddleX/blob/develop/paddlex/configs/modules/image_classification/ResNet101.yaml) | √ | √ | -| PaddleX | 图像分类 | [ResNet152](https://github.com/PaddlePaddle/PaddleX/blob/develop/paddlex/configs/modules/image_classification/ResNet152.yaml) | √ | √ | -| PaddleX | 图像分类 | [PP-LCNet_x0_25](https://github.com/PaddlePaddle/PaddleX/blob/develop/paddlex/configs/modules/image_classification/PP-LCNet_x0_25.yaml) | √ | √ | -| PaddleX | 图像分类 | [PP-LCNet_x0_35](https://github.com/PaddlePaddle/PaddleX/blob/develop/paddlex/configs/modules/image_classification/PP-LCNet_x0_35.yaml) | √ | √ | -| PaddleX | 图像分类 | [PP-LCNet_x0_5](https://github.com/PaddlePaddle/PaddleX/blob/develop/paddlex/configs/modules/image_classification/PP-LCNet_x0_5.yaml) | √ | √ | -| PaddleX | 图像分类 | [PP-LCNet_x0_75](https://github.com/PaddlePaddle/PaddleX/blob/develop/paddlex/configs/modules/image_classification/PP-LCNet_x0_75.yaml) | √ | √ | -| PaddleX | 图像分类 | [PP-LCNet_x1_0](https://github.com/PaddlePaddle/PaddleX/blob/develop/paddlex/configs/modules/image_classification/PP-LCNet_x1_0.yaml) | √ | √ | -| PaddleX | 图像分类 | [PP-LCNet_x1_5](https://github.com/PaddlePaddle/PaddleX/blob/develop/paddlex/configs/modules/image_classification/PP-LCNet_x1_5.yaml) | √ | √ | -| PaddleX | 图像分类 | [PP-LCNet_x2_0](https://github.com/PaddlePaddle/PaddleX/blob/develop/paddlex/configs/modules/image_classification/PP-LCNet_x2_0.yaml) | √ | √ | -| PaddleX | 图像分类 | [PP-LCNet_x2_5](https://github.com/PaddlePaddle/PaddleX/blob/develop/paddlex/configs/modules/image_classification/PP-LCNet_x2_5.yaml) | √ | √ | -| PaddleX | 图像分类 | [MobileNetV3_small_x0_35](https://github.com/PaddlePaddle/PaddleX/blob/develop/paddlex/configs/modules/image_classification/MobileNetV3_small_x0_35.yaml) | √ | √ | -| PaddleX | 图像分类 | [MobileNetV3_small_x0_5](https://github.com/PaddlePaddle/PaddleX/blob/develop/paddlex/configs/modules/image_classification/MobileNetV3_small_x0_5.yaml) | √ | √ | -| PaddleX | 图像分类 | [MobileNetV3_small_x0_75](https://github.com/PaddlePaddle/PaddleX/blob/develop/paddlex/configs/modules/image_classification/MobileNetV3_small_x0_75.yaml) | √ | √ | -| PaddleX | 图像分类 | [MobileNetV3_small_x1_0](https://github.com/PaddlePaddle/PaddleX/blob/develop/paddlex/configs/modules/image_classification/MobileNetV3_small_x1_0.yaml) | √ | √ | -| PaddleX | 图像分类 | [MobileNetV3_small_x1_25](https://github.com/PaddlePaddle/PaddleX/blob/develop/paddlex/configs/modules/image_classification/MobileNetV3_small_x1_25.yaml) | √ | √ | -| PaddleX | 图像分类 | [MobileNetV3_large_x0_35](https://github.com/PaddlePaddle/PaddleX/blob/develop/paddlex/configs/modules/image_classification/MobileNetV3_large_x0_35.yaml) | √ | √ | -| PaddleX | 图像分类 | [MobileNetV3_large_x0_5](https://github.com/PaddlePaddle/PaddleX/blob/develop/paddlex/configs/modules/image_classification/MobileNetV3_large_x0_5.yaml) | √ | √ | -| PaddleX | 图像分类 | [MobileNetV3_large_x0_75](https://github.com/PaddlePaddle/PaddleX/blob/develop/paddlex/configs/modules/image_classification/MobileNetV3_large_x0_75.yaml) | √ | √ | -| PaddleX | 图像分类 | [MobileNetV3_large_x1_0](https://github.com/PaddlePaddle/PaddleX/blob/develop/paddlex/configs/modules/image_classification/MobileNetV3_large_x1_0.yaml) | √ | √ | -| PaddleX | 图像分类 | [MobileNetV3_large_x1_25](https://github.com/PaddlePaddle/PaddleX/blob/develop/paddlex/configs/modules/image_classification/MobileNetV3_large_x1_25.yaml) | √ | √ | -| PaddleX | 图像分类 | [PP-HGNet_small](https://github.com/PaddlePaddle/PaddleX/blob/develop/paddlex/configs/modules/image_classification/PP-HGNet_small.yaml) | √ | √ | -| PaddleX | 目标检测 | [PP-YOLOE_plus-S](https://github.com/PaddlePaddle/PaddleX/blob/develop/paddlex/configs/modules/object_detection/PP-YOLOE_plus-S.yaml) | √ | √ | -| PaddleX | 目标检测 | [PP-YOLOE_plus-M](https://github.com/PaddlePaddle/PaddleX/blob/develop/paddlex/configs/modules/object_detection/PP-YOLOE_plus-M.yaml) | √ | √ | -| PaddleX | 目标检测 | [PP-YOLOE_plus-L](https://github.com/PaddlePaddle/PaddleX/blob/develop/paddlex/configs/modules/object_detection/PP-YOLOE_plus-L.yaml) | √ | √ | -| PaddleX | 目标检测 | [PP-YOLOE_plus-X](https://github.com/PaddlePaddle/PaddleX/blob/develop/paddlex/configs/modules/object_detection/PP-YOLOE_plus-X.yaml) | √ | √ | -| PaddleX | 目标检测 | [PicoDet-S](https://github.com/PaddlePaddle/PaddleX/blob/develop/paddlex/configs/modules/object_detection/PicoDet-S.yaml) | √ | √ | -| PaddleX | 目标检测 | [PicoDet-L](https://github.com/PaddlePaddle/PaddleX/blob/develop/paddlex/configs/modules/object_detection/PicoDet-L.yaml) | √ | √ | -| PaddleX | 语义分割 | [PP-LiteSeg-T](https://github.com/PaddlePaddle/PaddleX/blob/develop/paddlex/configs/modules/semantic_segmentation/PP-LiteSeg-T.yaml) | √ | √ | -| PaddleX | 文本检测 | [PP-OCRv4_server_det](https://github.com/PaddlePaddle/PaddleX/blob/develop/paddlex/configs/modules/text_detection/PP-OCRv4_server_det.yaml) | √ | √ | -| PaddleX | 文本检测 | [PP-OCRv4_mobile_det](https://github.com/PaddlePaddle/PaddleX/blob/develop/paddlex/configs/modules/text_detection/PP-OCRv4_mobile_det.yaml) | √ | √ | -| PaddleX | 文本识别 | [PP-OCRv4_server_rec](https://github.com/PaddlePaddle/PaddleX/blob/develop/paddlex/configs/modules/text_recognition/PP-OCRv4_server_rec.yaml) | √ | √ | -| PaddleX | 文本识别 | [PP-OCRv4_mobile_rec](https://github.com/PaddlePaddle/PaddleX/blob/develop/paddlex/configs/modules/text_recognition/PP-OCRv4_mobile_rec.yaml) | √ | √ | -| PaddleX | 版面分析 | [PicoDet_layout_1x](https://github.com/PaddlePaddle/PaddleX/blob/develop/paddlex/configs/modules/layout_detection/PicoDet_layout_1x.yaml) | √ | √ | -| PaddleX | 图像异常检测 | [STFPM](https://github.com/PaddlePaddle/PaddleX/blob/develop/paddlex/configs/modules/anomaly_detection/STFPM.yaml) | √ | √ | -| PaddleX | 人脸检测 | [PicoDet_LCNet_x2_5_face](https://github.com/PaddlePaddle/PaddleX/blob/release/3.0-beta2/paddlex/configs/modules/face_detection/PicoDet_LCNet_x2_5_face.yaml) | √ | √ | -| PaddleX | 时序预测 | [DLinear](https://github.com/PaddlePaddle/PaddleX/blob/develop/paddlex/configs/modules/ts_forecast/DLinear.yaml) | √ | √ | -| PaddleX | 时序预测 | [RLinear](https://github.com/PaddlePaddle/PaddleX/blob/develop/paddlex/configs/modules/ts_forecast/RLinear.yaml) | √ | √ | -| PaddleX | 时序预测 | [NLinear](https://github.com/PaddlePaddle/PaddleX/blob/develop/paddlex/configs/modules/ts_forecast/NLinear.yaml) | √ | √ | -| PaddleNLP | 自然语言理解模型 | [BERT](https://github.com/PaddlePaddle/PaddleNLP/tree/develop/slm/model_zoo/bert) | √ | √ | -| PaddleNLP | 自然语言理解模型 | [ERINE3.0](https://github.com/PaddlePaddle/PaddleNLP/blob/develop/slm/model_zoo/ernie-3.0/configs/modules/default.yml) | √ | √ | -| PaddleNLP | 大语言模型 | [LLaMA](https://github.com/PaddlePaddle/PaddleNLP/tree/develop/llm/devices/xpu/llama) | √ | √ | diff --git a/docs/hardware_support/xpu/xpu-p800_install_cn.md b/docs/hardware_support/xpu/xpu-p800_install_cn.md index 181393abf3b..c07605990b1 100644 --- a/docs/hardware_support/xpu/xpu-p800_install_cn.md +++ b/docs/hardware_support/xpu/xpu-p800_install_cn.md @@ -22,11 +22,15 @@ lspci -d 2057: -n ## 运行环境准备 -推荐使用飞桨官方发布的昆仑芯 XPU 开发镜像,该镜像预装有昆仑芯基础运行环境库(XRE)。 +您可以基于 docker、pip、源码等不同方式准备飞桨开发环境 + +### 基于 Docker 的方式(推荐) + +我们推荐使用飞桨官方发布的昆仑芯 XPU 开发镜像,该镜像预装有昆仑芯基础运行环境库(XRE)和飞桨 3.0 版本的 SDK。 ```bash # 拉取镜像 -docker pull ccr-2vdh3abv-pub.cnc.bj.baidubce.com/device/paddle-xpu:ubuntu20-x86_64-gcc84-py310 +docker pull ccr-2vdh3abv-pub.cnc.bj.baidubce.com/device/paddle-xpu:3.0.0-xpu-ubuntu20-x86_64-gcc84-py310 ``` ```bash # 参考如下命令,启动容器 @@ -34,7 +38,7 @@ docker run -it --name paddle-xpu-dev -v $(pwd):/work \ -v /usr/local/bin/xpu-smi:/usr/local/bin/xpu-smi \ -w=/work --shm-size=128G --network=host --privileged \ --cap-add=SYS_PTRACE --security-opt seccomp=unconfined \ - ccr-2vdh3abv-pub.cnc.bj.baidubce.com/device/paddle-xpu:ubuntu20-x86_64-gcc84-py310 /bin/bash + ccr-2vdh3abv-pub.cnc.bj.baidubce.com/device/paddle-xpu:3.0.0-xpu-ubuntu20-x86_64-gcc84-py310 /bin/bash ``` #### 选项说明及可调整参数 @@ -58,42 +62,33 @@ docker run -it --name paddle-xpu-dev -v $(pwd):/work \ xpu-smi ``` -## 安装飞桨框架 - -**注意**:当前飞桨 develop 分支仅支持 X86 架构,如需昆仑芯 XPU 的 ARM 架构支持,请提交[issue](https://github.com/PaddlePaddle/Paddle/issues)告知我们 - -### 安装方式一:wheel 包安装 - -在启动的 docker 容器中,下载并安装飞桨官网发布的 wheel 包。 +### 基于 pip 安装的方式 ```bash # 下载并安装 wheel 包 -python -m pip install --pre paddlepaddle-xpu -i https://www.paddlepaddle.org.cn/packages/nightly/xpu-p800/ +python -m pip install paddlepaddle-xpu==3.0.0 -i https://www.paddlepaddle.org.cn/packages/stable/xpu/ ``` -⚠️ 注意:nightly 版本为每日构建,可能存在不稳定性。如果需要更稳定的版本,建议使用 3.0-rc 版本。 -### 安装方式二:源代码编译安装 -在启动的 docker 容器中,下载 Paddle 源码并编译,CMAKE 编译选项含义请参见[编译选项表](https://www.paddlepaddle.org.cn/documentation/docs/zh/develop/install/Tables.html#Compile)。 +### 基于源码编译的方式 ```bash # 下载 Paddle 源码 -git clone https://github.com/PaddlePaddle/Paddle.git -b develop +git clone https://github.com/PaddlePaddle/Paddle.git -b release/3.0 cd Paddle # 创建编译目录 mkdir build && cd build # cmake 编译命令 -cmake .. -DPY_VERSION=3.10 -DCMAKE_BUILD_TYPE=Release -DWITH_GPU=OFF -DWITH_XPU=ON -DON_INFER=ON \ - -DWITH_PYTHON=ON -DWITH_MKL=OFF -DWITH_XPU_BKCL=ON -DWITH_TESTING=ON -DWITH_DISTRIBUTE=ON -DWITH_XPU_XRE5=ON -DWITH_XCCL_RDMA=ON +cmake .. -DPY_VERSION=3.10 -DCMAKE_BUILD_TYPE=Release -DWITH_GPU=OFF -DWITH_XPU=ON -DON_INFER=ON -DWITH_PYTHON=ON -DWITH_MKL=OFF -DWITH_XPU_BKCL=ON -DWITH_TESTING=OFF -DWITH_DISTRIBUTE=ON -DWITH_XPU_XRE5=ON -DWITH_XCCL_RDMA=ON # make 编译命令 -make -j50 TARGET=HASWELL +make -j128 TARGET=HASWELL # 编译产出在 build/python/dist/ 路径下,使用 pip 安装即可 -pip install -U paddlepaddle_xpu-0.0.0-cp310-cp310-linux_x86_64.whl +python -m pip install -U paddlepaddle_xpu-*-linux_x86_64.whl ``` -⚠️ 注意:nightly 版本为每日构建,可能存在不稳定性。如果需要更稳定的版本,建议使用 3.0-rc 版本。 + ## 基础功能检查 安装完成后,在 docker 容器中输入如下命令进行飞桨基础健康功能的检查。 diff --git a/docs/hardware_support/xpu/xpu-p800_paddle_tutorial_cn.md b/docs/hardware_support/xpu/xpu-p800_paddle_tutorial_cn.md index d259fe86d57..7f8211934de 100644 --- a/docs/hardware_support/xpu/xpu-p800_paddle_tutorial_cn.md +++ b/docs/hardware_support/xpu/xpu-p800_paddle_tutorial_cn.md @@ -8,20 +8,10 @@ * 考虑到环境差异性,我们推荐使用教程提供的标准镜像完成环境准备: - * 镜像链接: ccr-2vdh3abv-pub.cnc.bj.baidubce.com/device/paddle-xpu:ubuntu20-x86_64-gcc84-py310 + * 镜像链接: ccr-2vdh3abv-pub.cnc.bj.baidubce.com/device/paddle-xpu:3.0.0-xpu-ubuntu20-x86_64-gcc84-py310 -### 环境安装 +* 镜像中默认装有 3.0 版本的 PaddlePaddle -安装 PaddlePaddle - -*该命令会自动安装飞桨主框架每日自动构建的 nightly-build 版本* - -*由于 xpu 代码位于飞桨主框架中,因此我们不需要安装额外的 Custom Device 包* - -```shell -python -m pip install --pre paddlepaddle-xpu -i https://www.paddlepaddle.org.cn/packages/nightly/xpu-p800/ -``` -⚠️ 注意:nightly 版本为每日构建,可能存在不稳定性。如果需要更稳定的版本,建议使用 3.0-rc 版本。 ## 二、运行示例 飞桨框架集成了经典的视觉模型用于帮助用户快速上手,我们将基于 ResNet50 结构,在 Cifar10 数据集上进行一次快速训练,用于帮助您了解如何基于昆仑芯 XPU P800 进行训练(和 GPU 训练代码相比,差异点仅为 `paddle.set_device("xpu")`) diff --git a/docs/hardware_support/xpu/xpu-p800_paddlex_tutorial_cn.md b/docs/hardware_support/xpu/xpu-p800_paddlex_tutorial_cn.md index 25471e24ef6..e23fb61c8a7 100644 --- a/docs/hardware_support/xpu/xpu-p800_paddlex_tutorial_cn.md +++ b/docs/hardware_support/xpu/xpu-p800_paddlex_tutorial_cn.md @@ -8,19 +8,11 @@ * 考虑到环境差异性,我们推荐使用教程提供的标准镜像完成环境准备: - * 镜像链接: ccr-2vdh3abv-pub.cnc.bj.baidubce.com/device/paddle-xpu:ubuntu20-x86_64-gcc84-py310 + * 镜像链接: ccr-2vdh3abv-pub.cnc.bj.baidubce.com/device/paddle-xpu:3.0.0-xpu-ubuntu20-x86_64-gcc84-py310 ### 环境安装 -1. 安装 PaddlePaddle - -*该命令会自动安装飞桨主框架每日自动构建的 nightly-build 版本* - -*由于 xpu 代码位于飞桨主框架中,因此我们不需要安装额外的 Custom Device 包* - -```shell -python -m pip install --pre paddlepaddle-xpu -i https://www.paddlepaddle.org.cn/packages/nightly/xpu-p800/ -``` +1. 镜像中默认装有 3.0 版本的 PaddlePaddle,无需额外安装 2. 安装 PaddleX 代码库 @@ -36,7 +28,7 @@ cd PaddleX # -e:以可编辑模式安装,当前项目的代码更改,都会直接作用到已经安装的 PaddleX Wheel pip install -e . ``` -⚠️ 注意:nightly 版本为每日构建,可能存在不稳定性。如果需要更稳定的版本,建议使用 3.0-rc 版本。 + ## 基于 PaddleX 训练 ResNet50 ### 一、安装 PaddleX 依赖 diff --git a/docs/install/conda/linux-conda.md b/docs/install/conda/linux-conda.md index 2ab515f42f3..ac132adb838 100644 --- a/docs/install/conda/linux-conda.md +++ b/docs/install/conda/linux-conda.md @@ -9,7 +9,7 @@ #### 1.1.1 安装环境 -首先根据具体的 Python 版本创建 Anaconda 虚拟环境,PaddlePaddle 的 Anaconda 安装支持 3.8 - 3.12 版本的 Python 安装环境。 +首先根据具体的 Python 版本创建 Anaconda 虚拟环境,PaddlePaddle 的 Anaconda 安装支持 3.8 - 3.13 版本的 Python 安装环境。 ``` conda create -n paddle_env python=YOUR_PY_VER diff --git a/docs/install/conda/linux-conda_en.md b/docs/install/conda/linux-conda_en.md index 0a7ae144e88..23b80daa8bd 100644 --- a/docs/install/conda/linux-conda_en.md +++ b/docs/install/conda/linux-conda_en.md @@ -9,7 +9,7 @@ #### 1.1.1 Create the Anaconda Virtual Environment -Create virtual environment First create the Anaconda virtual environment according to the specific Python version. The Anaconda installation of PaddlePaddle supports Python version of 3.8 - 3.12. +Create virtual environment First create the Anaconda virtual environment according to the specific Python version. The Anaconda installation of PaddlePaddle supports Python version of 3.8 - 3.13. ``` conda create -n paddle_env python=YOUR_PY_VER diff --git a/docs/install/conda/macos-conda.md b/docs/install/conda/macos-conda.md index 9700bffc6b5..b2ee13dc37b 100644 --- a/docs/install/conda/macos-conda.md +++ b/docs/install/conda/macos-conda.md @@ -8,7 +8,7 @@ #### 1.1.1 安装环境 -首先根据具体的 Python 版本创建 Anaconda 虚拟环境,PaddlePaddle 的 Anaconda 安装支持 3.8 - 3.12 版本的 Python 安装环境。 +首先根据具体的 Python 版本创建 Anaconda 虚拟环境,PaddlePaddle 的 Anaconda 安装支持 3.8 - 3.13 版本的 Python 安装环境。 ``` conda create -n paddle_env python=YOUR_PY_VER diff --git a/docs/install/conda/macos-conda_en.md b/docs/install/conda/macos-conda_en.md index 0aa348b71c8..0f419fbe34a 100644 --- a/docs/install/conda/macos-conda_en.md +++ b/docs/install/conda/macos-conda_en.md @@ -10,7 +10,7 @@ #### 1.1.1 Create the Anaconda Virtual Environment -Create virtual environment First create the Anaconda virtual environment according to the specific Python version. The Anaconda installation of PaddlePaddle supports Python version of 3.8 - 3.12. +Create virtual environment First create the Anaconda virtual environment according to the specific Python version. The Anaconda installation of PaddlePaddle supports Python version of 3.8 - 3.13. ``` conda create -n paddle_env python=YOUR_PY_VER diff --git a/docs/install/conda/windows-conda.md b/docs/install/conda/windows-conda.md index a246ceb4470..9fb41fbbd38 100644 --- a/docs/install/conda/windows-conda.md +++ b/docs/install/conda/windows-conda.md @@ -9,7 +9,7 @@ #### 1.1.1 安装环境 -首先根据具体的 Python 版本创建 Anaconda 虚拟环境,PaddlePaddle 的 Anaconda 安装支持 3.8 - 3.12 版本的 Python 安装环境。 +首先根据具体的 Python 版本创建 Anaconda 虚拟环境,PaddlePaddle 的 Anaconda 安装支持 3.8 - 3.13 版本的 Python 安装环境。 ``` conda create -n paddle_env python=YOUR_PY_VER diff --git a/docs/install/conda/windows-conda_en.md b/docs/install/conda/windows-conda_en.md index 5374d3952ec..fababe73fc6 100644 --- a/docs/install/conda/windows-conda_en.md +++ b/docs/install/conda/windows-conda_en.md @@ -10,7 +10,7 @@ #### 1.1.1 Create the Anaconda Virtual Environment -Create virtual environment First create the Anaconda virtual environment according to the specific Python version. The Anaconda installation of PaddlePaddle supports Python version of 3.8 - 3.12. +Create virtual environment First create the Anaconda virtual environment according to the specific Python version. The Anaconda installation of PaddlePaddle supports Python version of 3.8 - 3.13. ``` conda create -n paddle_env python=YOUR_PY_VER diff --git a/docs/install/docker/linux-docker.md b/docs/install/docker/linux-docker.md index 7d77e3ee2bc..3b68d5197dc 100644 --- a/docs/install/docker/linux-docker.md +++ b/docs/install/docker/linux-docker.md @@ -5,7 +5,7 @@ ## 环境准备 -- 目前支持的系统类型,请见[安装说明](/documentation/docs/zh/install/index_cn.html),请注意目前暂不支持在 CentOS 6 使用 Docker +- 目前支持的系统类型,请见[安装说明](/documentation/docs/zh/install/index_cn.html) - 在本地主机上[安装 Docker](https://docs.docker.com/engine/install/) @@ -31,10 +31,10 @@ * GPU 版的 PaddlePaddle(**建议拉取最新版本镜像,并确保已经成功安装 NVIDIA Container Toolkit**): ``` - docker pull ccr-2vdh3abv-pub.cnc.bj.baidubce.com/paddlepaddle/paddle:3.0.0-gpu-cuda11.8-cudnn8.6-trt8.5 + docker pull ccr-2vdh3abv-pub.cnc.bj.baidubce.com/paddlepaddle/paddle:3.0.0-gpu-cuda11.8-cudnn8.9-trt8.6 ``` ``` - docker pull ccr-2vdh3abv-pub.cnc.bj.baidubce.com/paddlepaddle/paddle:3.0.0-gpu-cuda12.6-cudnn9.0-trt8.6 + docker pull ccr-2vdh3abv-pub.cnc.bj.baidubce.com/paddlepaddle/paddle:3.0.0-gpu-cuda12.6-cudnn9.5-trt10.5 ``` 如果您的机器不在中国大陆地区,可以直接从 DockerHub 拉取镜像: @@ -51,10 +51,10 @@ * GPU 版的 PaddlePaddle(**建议拉取最新版本镜像,并确保已经成功安装 NVIDIA Container Toolkit**): ``` - ccr-2vdh3abv-pub.cnc.bj.baidubce.com/paddlepaddle/paddle:3.0.0-gpu-cuda11.8-cudnn8.6-trt8.5 + docker pull paddlepaddle/paddle:3.0.0-gpu-cuda11.8-cudnn8.9-trt8.6 ``` ``` - ccr-2vdh3abv-pub.cnc.bj.baidubce.com/paddlepaddle/paddle:3.0.0-gpu-cuda12.6-cudnn9.0-trt8.6 + docker pull paddlepaddle/paddle:3.0.0-gpu-cuda12.6-cudnn9.5-trt10.5 ``` 您还可以访问[DockerHub](https://hub.docker.com/r/paddlepaddle/paddle/tags/)获取更多镜像。 @@ -109,7 +109,7 @@ * 使用 GPU 版本的 PaddlePaddle: ``` - docker run --gpus all --name paddle_docker -v $PWD:/paddle --network=host -it ccr-2vdh3abv-pub.cnc.bj.baidubce.com/paddlepaddle/paddle:3.0.0-gpu-cuda12.6-cudnn9.0-trt8.6 /bin/bash + docker run --gpus all --name paddle_docker -v $PWD:/paddle --network=host -it ccr-2vdh3abv-pub.cnc.bj.baidubce.com/paddlepaddle/paddle:3.0.0-gpu-cuda12.6-cudnn9.5-trt10.5 /bin/bash ``` - `--gpus all`: 在 Docker 容器中允许使用 gpu; @@ -121,7 +121,7 @@ - `-it`: 与宿主机保持交互状态; - - `ccr-2vdh3abv-pub.cnc.bj.baidubce.com/paddlepaddle/paddle:3.0.0-gpu-cuda12.6-cudnn9.0-trt8.6`:使用名为`ccr-2vdh3abv-pub.cnc.bj.baidubce.com/paddlepaddle/paddle`, tag 为`3.0.0-gpu-cuda12.6-cudnn9.0-trt8.6`的镜像创建 Docker 容器,/bin/bash 进入容器后启动/bin/bash 命令。 + - `ccr-2vdh3abv-pub.cnc.bj.baidubce.com/paddlepaddle/paddle:3.0.0-gpu-cuda12.6-cudnn9.5-trt10.5`:使用名为`ccr-2vdh3abv-pub.cnc.bj.baidubce.com/paddlepaddle/paddle`, tag 为`3.0.0-gpu-cuda12.6-cudnn9.5-trt10.5`的镜像创建 Docker 容器,/bin/bash 进入容器后启动/bin/bash 命令。 @@ -148,12 +148,12 @@ 安装了 3.0.0 版本 paddle 的 CPU 镜像,且镜像中预装好了 jupyter,启动 docker 即运行 jupyter 服务 - ccr-2vdh3abv-pub.cnc.bj.baidubce.com/paddlepaddle/paddle:3.0.0-gpu-cuda11.8-cudnn8.6-trt8.5 - 安装了 3.0.0 版本 paddle 的 GPU 镜像,cuda 版本为 11.8,cudnn 版本为 8.6,trt 版本为 8.5 + ccr-2vdh3abv-pub.cnc.bj.baidubce.com/paddlepaddle/paddle:3.0.0-gpu-cuda11.8-cudnn8.9-trt8.6 + 安装了 3.0.0 版本 paddle 的 GPU 镜像,cuda 版本为 11.8,cudnn 版本为 8.9,trt 版本为 8.6 - ccr-2vdh3abv-pub.cnc.bj.baidubce.com/paddlepaddle/paddle:3.0.0-gpu-cuda12.6-cudnn9.0-trt8.6 - 安装了 3.0.0 版本 paddle 的 GPU 镜像,cuda 版本为 12.6,cudnn 版本为 9.0,trt 版本为 8.6 + ccr-2vdh3abv-pub.cnc.bj.baidubce.com/paddlepaddle/paddle:3.0.0-gpu-cuda12.6-cudnn9.5-trt10.5 + 安装了 3.0.0 版本 paddle 的 GPU 镜像,cuda 版本为 12.6,cudnn 版本为 9.5,trt 版本为 10.5 diff --git a/docs/install/docker/linux-docker_en.md b/docs/install/docker/linux-docker_en.md index 0c2ad7de88e..5674766567d 100644 --- a/docs/install/docker/linux-docker_en.md +++ b/docs/install/docker/linux-docker_en.md @@ -31,10 +31,10 @@ For domestic users, when downloading docker is slow due to network problems, you * GPU version of PaddlePaddle(**Latest version of gpu image is recommended, and make sure NVIDIA Container Toolkit is installed successfully**): ``` - docker pull ccr-2vdh3abv-pub.cnc.bj.baidubce.com/paddlepaddle/paddle:3.0.0-gpu-cuda12.6-cudnn9.0-trt8.6 + docker pull ccr-2vdh3abv-pub.cnc.bj.baidubce.com/paddlepaddle/paddle:3.0.0-gpu-cuda12.6-cudnn9.5-trt10.5 ``` ``` - docker pull ccr-2vdh3abv-pub.cnc.bj.baidubce.com/paddlepaddle/paddle:3.0.0-gpu-cuda11.8-cudnn8.6-trt8.5 + docker pull ccr-2vdh3abv-pub.cnc.bj.baidubce.com/paddlepaddle/paddle:3.0.0-gpu-cuda11.8-cudnn8.9-trt8.6 ``` If your machine is not in mainland China, you can pull the image directly from DockerHub: @@ -51,10 +51,10 @@ If your machine is not in mainland China, you can pull the image directly from D * GPU version of PaddlePaddle(**Latest version of gpu image is recommended, and make sure NVIDIA Container Toolkit is installed successfully**): ``` - docker pull paddlepaddle/paddle:3.0.0-gpu-cuda12.6-cudnn9.0-trt8.6 + docker pull paddlepaddle/paddle:3.0.0-gpu-cuda12.6-cudnn9.5-trt10.5 ``` ``` - docker pull paddlepaddle/paddle:3.0.0-gpu-cuda11.8-cudnn8.6-trt8.5 + docker pull paddlepaddle/paddle:3.0.0-gpu-cuda11.8-cudnn8.9-trt8.6 ``` You can see [DockerHub](https://hub.docker.com/r/paddlepaddle/paddle/tags/) to get more images. @@ -85,7 +85,7 @@ You can see [DockerHub](https://hub.docker.com/r/paddlepaddle/paddle/tags/) to g ``` - docker run --gpus all --name paddle_docker -v $PWD:/paddle --network=host -it ccr-2vdh3abv-pub.cnc.bj.baidubce.com/paddlepaddle/paddle:3.0.0-gpu-cuda12.6-cudnn9.0-trt8.6 /bin/bash + docker run --gpus all --name paddle_docker -v $PWD:/paddle --network=host -it ccr-2vdh3abv-pub.cnc.bj.baidubce.com/paddlepaddle/paddle:3.0.0-gpu-cuda12.6-cudnn9.5-trt10.5 /bin/bash ``` - `--gpus all`: gpu resources can be used in Docker container; @@ -98,7 +98,7 @@ You can see [DockerHub](https://hub.docker.com/r/paddlepaddle/paddle/tags/) to g - `-v $PWD:/paddle`: Specifies to mount the current path of the host (PWD variable in Linux will expand to the absolute path of the current path) to the /paddle directory inside the container; - - `ccr-2vdh3abv-pub.cnc.bj.baidubce.com/paddlepaddle/paddle:3.0.0-gpu-cuda12.6-cudnn9.0-trt8.6`: Specify the name of the image to be used. You can view it through the 'docker images' command. /bin/Bash is the command to be executed in Docker + - `ccr-2vdh3abv-pub.cnc.bj.baidubce.com/paddlepaddle/paddle:3.0.0-gpu-cuda12.6-cudnn9.5-trt10.5`: Specify the name of the image to be used. You can view it through the 'docker images' command. /bin/Bash is the command to be executed in Docker * Use CPU version of PaddlePaddle with jupyter: @@ -151,12 +151,12 @@ Now you have successfully used Docker to install PaddlePaddle. For more informat CPU image of paddle version 3.0.0 is installed, and jupyter is pre-installed in the image. Start the docker to run the jupyter service - ccr-2vdh3abv-pub.cnc.bj.baidubce.com/paddlepaddle/paddle:3.0.0-gpu-cuda12.6-cudnn9.0-trt8.6 - GPU image of paddle version 3.0.0 is installed, cuda version is 12.6, cudnn version is 9.0, trt version is 8.6 + ccr-2vdh3abv-pub.cnc.bj.baidubce.com/paddlepaddle/paddle:3.0.0-gpu-cuda12.6-cudnn9.5-trt10.5 + GPU image of paddle version 3.0.0 is installed, cuda version is 12.6, cudnn version is 9.5, trt version is 10.5 - ccr-2vdh3abv-pub.cnc.bj.baidubce.com/paddlepaddle/paddle:3.0.0-gpu-cuda11.8-cudnn8.6-trt8.5 - GPU image of paddle version 3.0.0 is installed, cuda version is 11.8, cudnn version is 8.6, trt version is 8.5 + ccr-2vdh3abv-pub.cnc.bj.baidubce.com/paddlepaddle/paddle:3.0.0-gpu-cuda11.8-cudnn8.9-trt8.6 + GPU image of paddle version 3.0.0 is installed, cuda version is 11.8, cudnn version is 8.9, trt version is 8.6 diff --git a/docs/install/index_cn.rst b/docs/install/index_cn.rst index dca23a0fb0f..bfe7fc8d0dc 100644 --- a/docs/install/index_cn.rst +++ b/docs/install/index_cn.rst @@ -20,10 +20,10 @@ **1. 操作系统要求:** -* Windows 7 / 8 / 10/ 11,专业版 / 企业版 +* Windows 7 / 8 / 10 / 11,专业版 / 企业版 * Ubuntu 20.04 / 22.04 -* CentOS 7 -* macOS 10.x/11.x/12.x/13.x/14.x +* almalinux 8 +* macOS 12.x/13.x/14.x/15.x * 操作系统要求是 64 位版本 **2. 处理器要求** @@ -33,7 +33,7 @@ **3. Python 和 pip 版本要求:** -* Python 的版本要求 3.8/3.9/3.10/3.11/3.12 +* Python 的版本要求 3.8/3.9/3.10/3.11/3.12/3.13 * Python 具有 pip, 且 pip 的版本要求 20.2.2+ * Python 和 pip 要求是 64 位版本 @@ -66,7 +66,7 @@ 4. 检查 Python 的版本 - 使用以下命令确认是 3.8/3.9/3.10/3.11/3.12 + 使用以下命令确认是 3.8/3.9/3.10/3.11/3.12/3.13 :: python --version @@ -88,6 +88,10 @@ 7. 如果您希望使用 `pip `_ 进行安装 PaddlePaddle 可以直接使用以下命令: + 注意: + + * 如果你想要安装 paddlepaddle,该版本要求 libstdc++.so.6 的版本大于 3.4.25。为了满足此要求,您可以选择安装 GCC 8 或者更高的 GCC 版本,或者单独升级 libstdc++库。 + (1). **CPU 版本** :如果您只是想安装 CPU 版本请参考如下命令安装 安装 CPU 版本的命令为: @@ -97,16 +101,10 @@ (2). **GPU 版本** :如果您想使用 GPU 版本请参考如下命令安装 - 注意: - - * 如果您想要安装 CUDA 12.3 版本,该版本需要 libstdc++.so.6 的版本大于 3.4.30。为了满足此要求,您可以选择安装 GCC 12 版本,或者单独升级 libstdc++库。 - - * 如果你想要安装 CUDA 11.8 版本,该版本要求 libstdc++.so.6 的版本大于 3.4.25。为了满足此要求,您可以选择安装 GCC 8 或者更高的 GCC 版本,或者单独升级 libstdc++库。 - - 安装 GPU cuda12.3 版本的命令为: + 安装 GPU cuda12.6 版本的命令为: :: - python -m pip install --pre paddlepaddle-gpu -i https://www.paddlepaddle.org.cn/packages/nightly/cu123/ + python -m pip install --pre paddlepaddle-gpu -i https://www.paddlepaddle.org.cn/packages/nightly/cu126/ 安装 GPU cuda11.8 版本的命令为: :: diff --git a/docs/install/index_en.rst b/docs/install/index_en.rst index 4242a6e8d1f..431ef2e88c3 100644 --- a/docs/install/index_en.rst +++ b/docs/install/index_en.rst @@ -9,8 +9,7 @@ Important updates ---------------------- -* Add support for python3.12, and no longer supports python3.7 -* Add support for CUDA 12.0, and no longer supports CUDA 10.2 +* Paddle supports user installation without depending on CUDA and cuDNN, and automatically handles the installation of CUDA and cuDNN. ------------------------ @@ -23,10 +22,10 @@ The manuals will guide you to build and install PaddlePaddle on your 64-bit desk 1. Operating system requirements: >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> -* Windows 7 / 8 / 10, Pro/Enterprise -* Ubuntu 18.04 / 20.04 -* CentOS 7 -* macOS 10.x/11.x/12.x/13.x/14.x +* Windows 7 / 8 / 10 / 11, Pro/Enterprise +* Ubuntu 20.04 / 22.04 +* almalinux 8 +* macOS 12.x/13.x/14.x/15.x * 64-bit operating system is required 2. Processor requirements: @@ -38,180 +37,99 @@ The manuals will guide you to build and install PaddlePaddle on your 64-bit desk 3. Version requirements of python and pip: >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> -* Python requires version 3.8/3.9/3.10/3.11/3.12 +* Python requires version 3.8/3.9/3.10/3.11/3.12/3.13 * Python needs pip, and pip requires version 20.2.2 or above * Python and pip requires 64-bit -4. PaddlePaddle's support for GPU: ->>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> +**First Installation Method: Using pip for installation** +You can choose any of the four methods: "Using pip for installation", "Using conda for installation", "Using docker for installation", "Compiling from source code for installation". -* Currently, **PaddlePaddle** supports **CUDA** driver of **NVIDIA** graphics card and **ROCm** driver of **AMD** card. -* You need to install `cuDNN `_ , and version 7.6+ is required(For CUDA11) -* If you need GPU multi-card mode, you need to install `NCCL 2 `_ +This section will introduce the installation method using pip. - * Only Ubuntu/CentOS support NCCL 2 -* You need to install `CUDA `_ , depending on your system, there are different requirements for CUDA version: +1.You need to ensure that your operating system meets the requirements listed above. - * Windows install GPU version +2.You need to ensure that your processor meets the requirements listed above. - * Windows 7 / 8 / 10 support CUDA 11.0/11.2/11.6/11.8/12.0 single-card mode - * don't support install using **nvidia-docker** - * Ubuntu install GPU version +3.Ensure that the Python where you need to install PaddlePaddle is in your expected location, as your computer may have multiple Pythons. - * Ubuntu 18.04 supports CUDA (11.0 - 12.0) - * Ubuntu 20.04 supports CUDA (11.0 - 12.0) - * If you install using **nvidia-docker** , it supports CUDA 11.2/11.7/12.0 - * CentOS install GPU version + Use the following command to output the Python path, depending on your environment you may need to replace all the python in the command line in the instructions with the specific Python path. - * If you install using native **pip** : + In the Windows environment, the command to output the Python path is: - * CentOS 7 supports CUDA (11.0 - 12.0) - * If you compile and install using native source code: + :: - * CentOS 7 supports CUDA (11.0 - 12.0) - * If you install using **nvidia-docker** , CentOS 7 supports CUDA 11.2/11.7/12.0 - * macOS isn't supported: PaddlePaddle has no GPU support in Mac OS platform + where python -Please make sure your environment meets the above conditions. If you have other requirements, please refer to `Appendix `_ . + In the macOS/Linux environment, the command to output the Python path is: -5. PaddlePaddle's support for NCCL: ->>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> + :: -* Support for Windows + which python - * not support NCCL -* Support for Ubuntu +4.Check the Python version - * Ubuntu 18.04: + Use the following command to confirm it is 3.8/3.9/3.10/3.11/3.12/3.13 - * support NCCL v2.4.7 / v2.16.5 under CUDA11 - * Ubuntu 20.04: - - * support v2.4.7 / 2.16.5 under CUDA11 -* Support for CentOS - - * CentOS 7: - - * support NCCL v2.4.7-v2.16.5 under CUDA11 -* Support for macOS - - * not support NCCL - - -The first way to install: use pip to install ->>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> - -You can choose any of the four ways to install: "use pip to install", "use Conda to install", "use Docker to install", "compiling from the source code" - -This section describes how to use pip to install. - -1. You need to confirm that your operating system meets the requirements listed above - -2. You need to confirm that your processor meets the requirements listed above - -3. Confirm that the Python where you need to install PaddlePaddle is your expected location, because your computer may have multiple Python - - Use the following command to output Python path. Depending on your environment, you may need to replace Python in all command lines in the description with specific Python path - - In the Windows environment, the command to output Python path is: - - :: - - where python - - In the macOS/Linux environment, the command to output Python path is: - - :: - - which python - - -4. Check the version of Python - - Confirm the Python is 3.8/3.9/3.10/3.11/3.12 using command :: python --version -5. Check the version of pip and confirm it is 20.2.2 or above +5.Check the pip version, confirm it is 20.2.2+ :: python -m ensurepip python -m pip --version - -6. Confirm that Python and pip is 64 bit,and the processor architecture is x86_64(or called x64、Intel 64、AMD64)architecture. Currently. The first line below outputs "64bit", and the second line outputs "x86_64", "x64" or "AMD64" : +6.Confirm that Python and pip are 64 bit, and the processor architecture is x86_64 (also known as x64, Intel 64, AMD64). The first line of the following output is "64bit", and the second line output is "x86_64", "x64" or "AMD64": :: python -c "import platform;print(platform.architecture()[0]);print(platform.machine())" +7.If you want to use pip _ to install PaddlePaddle, you can directly use the following command: -7. If you want to use `pip `_ to install PaddlePaddle, you can use the command below directly: - - (1). **CPU version** : If you only want to install CPU version, please refer to command below - - Command to install CPU version is: - :: - - python -m pip install paddlepaddle==2.6.0 -i https://mirror.baidu.com/pypi/simple - - or - - python -m pip install paddlepaddle==2.6.0 -i https://pypi.tuna.tsinghua.edu.cn/simple - - - (2). **GPU version** : If you only want to install GPU version, please refer to command below - - - Note: - - * You need to confirm that your GPU meets the requirements listed above - - :: - - python -m pip install paddlepaddle-gpu==2.6.0 -i https://mirror.baidu.com/pypi/simple - - or - - python -m pip install paddlepaddle-gpu==2.6.0 -i https://pypi.tuna.tsinghua.edu.cn/simple + Note: + * If you want to install paddlepaddle, the version requires libstdc++.so.6 version greater than 3.4.25. To meet this requirement, you can choose to install GCC 8 or a higher GCC version, or upgrade the libstdc++ library separately. * - Please confirm that the Python where you need to install PaddlePaddle is your expected location, because your computer may have multiple Python. Depending on the environment, you may need to replace Python in all command lines in the instructions with Python 3 or specific Python path. + (1). **CPU version** : If you just want to install the CPU version, please refer to the following command for installation -8. Verify installation + The command to install the CPU version is: + :: + python -m pip install --pre paddlepaddle -i https://www.paddlepaddle.org.cn/packages/nightly/cpu/ - After the installation is complete, you can use `python` to enter the Python interpreter and then use `import paddle` and then `paddle.utils.run_check()` to verify that the installation was successful. + (2). **GPU version** : If you want to use the GPU version, please refer to the following command for installation - If `PaddlePaddle is installed successfully!` appears, it means the installation was successful. + The command to install the GPU cuda12.6 version is: + :: + python -m pip install --pre paddlepaddle-gpu -i https://www.paddlepaddle.org.cn/packages/nightly/cu126/ + The command to install the GPU cuda11.8 version is: + :: + python -m pip install --pre paddlepaddle-gpu -i https://www.paddlepaddle.org.cn/packages/nightly/cu118/ -9. For more information to help, please refer to: + Please make sure that the Python where you need to install PaddlePaddle is in the expected location, as your computer may have multiple Pythons. Depending on your environment, you may need to replace all python in the command line in the instructions with the specific Python path. - `install under Ubuntu `_ +8.Verify the installation - `install under macOS `_ + Use python to enter the python interpreter, enter import paddle, then enter paddle.utils.run_check(). - `install under Windows `_ + If PaddlePaddle is installed successfully! appears, it means you have successfully installed. +9.For more help information, please refer to: + `PIP Installation under Linux `_ -The second way to install: compile and install with container ->>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> + `PIP Installation under macOS `_ -- We recommend that you use `NVIDIA PaddlePaddle Container `_ for your development environment installation. -- Pros - 1. Lastest version of CUDA - 2. Newer verison of Ubuntu OS(18.04) - 3. Performance and development efficiency have been optimized by NVIDIA + `PIP Installation under Windows `_ + **Second Installation Method: Using Source Code Compilation for Installation** -The third way to install: compile and install with source code ->>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> + - If you are just using PaddlePaddle, it is recommended to use **pip** for installation. -- If you use PaddlePaddle only, we suggest you installation methods **pip** to install. -- If you need to develop PaddlePaddle, please refer to `compile from source code `_ + - If you have a need to develop PaddlePaddle, please refer to: `Compiling from Source `_ .. toctree:: :hidden: @@ -220,6 +138,5 @@ The third way to install: compile and install with source code conda/fromconda_en.rst docker/fromdocker_en.rst compile/fromsource_en.rst - install_xpu_en.md install_NGC_PaddlePaddle_en.rst Tables_en.md diff --git a/docs/install/pip/linux-pip.md b/docs/install/pip/linux-pip.md index 749bc35bd54..e88661b92d5 100644 --- a/docs/install/pip/linux-pip.md +++ b/docs/install/pip/linux-pip.md @@ -27,7 +27,7 @@ * 需要确认 python 的版本是否满足要求 - * 使用以下命令确认是 3.8/3.9/3.10/3.11/3.12 + * 使用以下命令确认是 3.8/3.9/3.10/3.11/3.12/3.13 python3 --version @@ -47,7 +47,6 @@ ``` - * 默认提供的安装包需要计算机支持 MKL, Intel 芯片都支持 MKL @@ -77,7 +76,7 @@ #### 2.2 GPU 版的 PaddlePaddle -2.2.1 CUDA11.8 的 PaddlePaddle(依赖 gcc8+, 如果需要使用 TensorRT 可自行安装 TensorRT8.5.3.1) +2.2.1 CUDA11.8 的 PaddlePaddle(如果需要使用 TensorRT 可自行安装 TensorRT8.5.3.1) ``` @@ -85,7 +84,7 @@ ``` -2.2.2 CUDA12.6 的 PaddlePaddle(依赖 gcc12+, 如果需要使用 TensorRT 可自行安装 TensorRT8.6.1.6) +2.2.2 CUDA12.6 的 PaddlePaddle(如果需要使用 TensorRT 可自行安装 TensorRT10.5.0.18) ``` diff --git a/docs/install/pip/linux-pip_en.md b/docs/install/pip/linux-pip_en.md index 12b8d9799ae..0b997504308 100644 --- a/docs/install/pip/linux-pip_en.md +++ b/docs/install/pip/linux-pip_en.md @@ -28,7 +28,7 @@ * You need to confirm whether the version of Python meets the requirements - * Use the following command to confirm that it is 3.8/3.9/3.10/3.11/3.12 + * Use the following command to confirm that it is 3.8/3.9/3.10/3.11/3.12/3.13 python3 --version @@ -87,7 +87,7 @@ You can choose the following version of PaddlePaddle to start installation: #### 2.2 GPU Version of PaddlePaddle -2.2.4 If you are using CUDA 11.8(Dependent on GCC8+, If you need to use TensorRT, you can install TensorRT 8.5.3.1 yourself) +2.2.4 If you are using CUDA 11.8(If you need to use TensorRT, you can install TensorRT 8.5.3.1 yourself) ``` @@ -95,7 +95,7 @@ You can choose the following version of PaddlePaddle to start installation: ``` -2.2.5 If you are using CUDA 12.6(Dependent on GCC8+, If you need to use TensorRT, you can install TensorRT 8.6.1.6 yourself) +2.2.5 If you are using CUDA 12.6(If you need to use TensorRT, you can install TensorRT 10.5.0.18 yourself) ``` python3 -m pip install paddlepaddle-gpu==3.0.0 -i https://www.paddlepaddle.org.cn/packages/stable/cu126/ diff --git a/docs/install/pip/windows-pip.md b/docs/install/pip/windows-pip.md index e92830fe054..8123d8c130b 100644 --- a/docs/install/pip/windows-pip.md +++ b/docs/install/pip/windows-pip.md @@ -8,7 +8,7 @@ * 需要确认 python 的版本是否满足要求 - * 使用以下命令确认是 3.8/3.9/3.10/3.11/3.12 + * 使用以下命令确认是 3.8/3.9/3.10/3.11/3.12/3.13 ``` python --version @@ -70,7 +70,7 @@ python -m pip install paddlepaddle-gpu==3.0.0 -i https://www.paddlepaddle.org.cn/packages/stable/cu118/ ``` -2.2.5 CUDA12.6 的 PaddlePaddle(如果需要使用 TensorRT 可自行安装 TensorRT8.6.1.6) +2.2.5 CUDA12.6 的 PaddlePaddle(如果需要使用 TensorRT 可自行安装 TensorRT10.5.0.18) ``` python -m pip install paddlepaddle-gpu==3.0.0 -i https://www.paddlepaddle.org.cn/packages/stable/cu126/ diff --git a/docs/install/pip/windows-pip_en.md b/docs/install/pip/windows-pip_en.md index 0ae1c2f333c..7698313ec83 100644 --- a/docs/install/pip/windows-pip_en.md +++ b/docs/install/pip/windows-pip_en.md @@ -6,7 +6,7 @@ * Confirm whether the Python version meets the requirements - * Use the following command to confirm that it is 3.8+/3.9+/3.10+/3.11+/3.12+ + * Use the following command to confirm that it is 3.8/3.9/3.10/3.11/3.12/3.13 python --version @@ -66,7 +66,7 @@ You can choose the following version of PaddlePaddle to start installation: python -m pip install paddlepaddle-gpu==3.0.0 -i https://www.paddlepaddle.org.cn/packages/stable/cu118/ ``` -2.2.5 If you are using CUDA 12.6(If you need to use TensorRT, you can install TensorRT 8.6.1.6 yourself) +2.2.5 If you are using CUDA 12.6(If you need to use TensorRT, you can install TensorRT 10.5.0.18 yourself) ``` python -m pip install paddlepaddle-gpu==3.0.0 -i https://www.paddlepaddle.org.cn/packages/stable/cu126/ diff --git a/docs/release_note_cn.md b/docs/release_note_cn.md index 0aaa2b14b52..e39d3f2e44c 100644 --- a/docs/release_note_cn.md +++ b/docs/release_note_cn.md @@ -1,462 +1,541 @@ -# 3.0 Beta Release Note -本版本的核心特性主要包括动静统一自动并行技术和神经网络编译器自动优化等新技术,旨在应对当前深度学习领域的新挑战。飞桨框架 3.0 Beta 版本延续了 2.x 版本动静统一、训推一体的设计理念,其开发接口全面兼容 2.x 版本。这意味着,使用 2.x 版本开发的代码,在绝大多数情况下无需修改,即可直接在 3.x 版本上运行。几个重点特性具体展开说明如下: -- 动静统一自动并行:为了降低大模型的编程难度,飞桨还优化了动静统一的半自动并行编程范式,显著简化了编程的复杂度。开发者无需深入研究手动并行编程的复杂概念和 API,只需进行少量的张量切分标注,即可完成混合并行模型的构建。框架能够自动推导分布式切分状态并添加通信算子,同时还支持一键动转静分布式训练,从而大幅简化了混合并行训练代码的开发过程。动静统一方面,飞桨通过采用基于字节码的动静转换技术,全面升级了其动转静训练能力,支持自适应的图构建功能。在 700 多个飞桨产业级模型上进行了验证,实现了一键动转静训练 100%的成功率。 -- 神经网络编译器自动优化:飞桨神经网络编译器 CINN(Compiler Infrastructure for Neural Networks)采用与框架一体化的设计,能够支持生成式模型、科学计算模型等多种模型的高效训练与可变形状推理,为计算灵活性与高性能之间提供了一个良好的平衡点。通过算子的自动融合和代码生成技术,Llama2 和 Stable Diffusion 模型的性能提升了 30%。 -- 高阶自动微分:为了更好支持科学计算等场景,飞桨框架设计并实现了基于组合算子机制的高阶自动微分技术,结合神经网络编译器自动优化技术,我们测试了超过 40 多个科学计算场景的微分方程,其求解速度领先业界同类产品 70%。 -- 高扩展中间表示 :为了提升飞桨框架的可扩展性,我们研发了高扩展中间表示 PIR(Paddle Intermediate Representation)。这一表示系统性地抽象了底层核心概念,提供了灵活且高效的组件。PIR 作为基础设施,支撑着动转静、自动微分、自动并行、组合算子、图优化等多项技术,并广泛应用于分布式训练、模型压缩、推理部署等场景。通过 PIR 提供的 DRR(Declarative Rewrite Rule)机制,Pass 的开发成本可以降低 60%。我们对超过 900 个模型配置进行了测试,结果显示,在使用 PIR 后,推理的整体性能提升了超过 10%。 -- 多硬件适配:飞桨为大模型硬件适配提供了功能完善且低成本的方案。新硬件仅需适配 30 余个接口,即可支持大模型的训练、压缩与推理。同时,飞桨提供了基于编译器的硬件接入方式,硬件厂商只需以插件的形式实现编译器的代码生成后端,便能实现与飞桨框架的高效适配。飞桨硬件接入本次新增了对 4 款硬件昆仑 XPU、昇腾 NPU、海光 DCU 和寒武纪 MLU 的日常发版支持。 +# 3.0 Release Note -此版本包含了对框架 2.x 版本部分已有功能的持续改进,同时本版本的新特性在使用体验、性能、二次开发便利度以及硬件适配能力等方面带来了显著提升。除了上述核心特性外,此版本在用户体验层面持续丰富并增强了满足更多场景的 API 功能,针对大模型场景优化完善了分布式并行策略优化和推理功能增强,在编译安装方面做了比较彻底的易用性改进,对依赖包的安装方式和版本进行了全新同步升级,对系统安全进行了全面加固,对产品文档也进行了全面的纠错检查,同时也对一些废弃代码做了大量的清理以保证架构的简洁性。飞桨 3.0 Beta 版本在不使用新特性的情况下,表现仍然是成熟稳定的,每个新特性都提供了可灵活进行控制的开关,方便用户快速了解相关产品功能和体验对比。 +作为中国首个自主研发的产业级深度学习平台,飞桨一直坚持开源路线,支撑产业智能化升级。飞桨框架 3.0 版本不仅延续了飞桨框架 2.0 系列动静统一、训推一体的特性,更在自动并行、神经网络编译器、高阶自动微分等方面取得突破,为大模型时代的技术创新与产业应用提供了强大支撑,为开发者打造了一站式、高性能的深度学习开发体验。无论是前沿算法研究还是产业级大模型落地,飞桨框架 3.0 都将成为开发者的首选利器。重点特性说明如下: -## 1.用户体验升级 +- **动静统一自动并行:** 这一功能大幅度降低了产业开发和训练的成本。用户只需在单卡基础上进行少量的张量切分标记,飞桨框架便会自动完成分布式切分信息的推导,并添加通信算子以确保逻辑的正确性。同时,根据模型结构和集群信息,结合显存和调度层的优化,飞桨能自动寻找最高效的分布式并行策略,从而大幅降低混合并行训练的开发成本,使开发者能够更专注于模型和算法的创新。自动并行架构进行了深入的验证和打磨,以更好地支持纯文稠密模型、纯文稀疏模型(MoE)和多模态理解模型等常见大模型场景的预训练+精调流程;完善算子的切分推导规则,并支持将自动并行训练参数转化成手动并行参数进行下游推理,自动并行达到了全面可用的状态,帮助用户降低大模型并行程序的开发成本。同时,为了进一步简化用户的分布式开发流程,推出全新的`paddle.distributed.parallel`接口,基于对分布式张量标记语法的封装,支持用户在模型组网外不侵入地配置数据并行、模型并行、流水并行等常见的并行策略。此外,静态图自动并行架构基于 PIR 完成了全面的升级,底层的基础组件、核心模块、并行策略和性能优化策略均统一基于扩展的 PIR `DistDialect`进行实现,进一步增强了自动并行的动静一致性,并在 Llama 系列模型上性能达到了持平甚至领先手动并行方式的水平。 +- **大模型训推一体:** 自 2.0 版本起,飞桨便采用了“动静统一、训推一体”的设计理念,3.0 版本也将继续秉持这一理念。得益于动静统一的架构和接口设计,飞桨能够完整支持动态图和静态图这两种不同的运行模式,并且具备出色的整图导出能力。飞桨的动转静整图导出成功率高达 95%,高于 PyTorch 的 62%。“训推一体”意味着能够在同一套框架下,尽可能复用训练和推理的代码,特别是复用模型组网代码。在完成模型的开发训练后,只需进行少量的开发工作,即可实现快速推理部署。这一特性为产业提供了极致的开发体验。它使训练和推理的能力能够相互复用,为大模型的全流程提供了统一的开发体验和极致的训练效率。通过动转静的工作,训练和推理的工作得以无缝衔接。支持多款主流大模型、DeepSeek-R1 满血版实现单机部署,吞吐提升一倍。 +- **科学计算高阶微分:** 飞桨框架 3.0 为科学计算提供了高阶自动微分、编译优化和分布式训练能力的支撑。英伟达 Modulus 的 41 个不同方程实验显示,飞桨的微分方程求解速度比 PyTorch 开启编译器优化后的版本平均快 115%。同时,飞桨还建设了面向通用数理问题求解的赛桨 PaddleScience 以及专注于生物计算的螺旋桨 PaddleHelix 工具包。此外,飞桨框架 3.0 还原生支持复数技术体系,这对于气象预报、汽车飞行器气动分析等场景下的数据特征分析具有重要意义。 +- **神经网络编译器:** 这一功能显著降低了性能优化的成本。飞桨的编译器采用与框架一体化的设计,能够支持生成式模型、科学计算模型等多种模型的高效训练与可变形状推理,在计算灵活性与高性能之间提供了良好的平衡点。使用 CINN 编译器后超过 60%的 模型有显著性能提升,平均提升达 27.4%。CINN 神经网络编译器在完备性、性能表现等方面效果全面提升。此版本中,我们对编译器前端、后端各个环节进行了全面优化:包括新增反向计算图自动 Re-Compute 机制、前端 Pass 性能优化、符号推导机制升级、算子融合策略优化、后端 Schedule 策略和下标表达式化简能力增强等,同时排查并修复了大量正确性和性能问题,系统化的提升了编译器的通用优化能力。 +- **异构多芯适配:** 飞桨的重要特色之一是适配异构多芯并充分释放硬件潜能。在接入机制上,飞桨提供了简洁高效的抽象接口和基础算子体系,降低了适配成本。在运行机制上,它优化了调度编排和存储共享等机制,提升了调度效率。从算子内核角度,飞桨提供了编译器自动融合调优方案,以提升端到端的性能。同时,飞桨还为新硬件厂商建设了代码合入、持续集成、模型回归测试等研发基础设施。这些机制保障了新硬件被纳入飞桨的正常发版体系中,用户无需编译即可直接安装试用。飞桨这种功能完善、低成本接入的机制吸引了硬件厂商共同为飞桨贡献了 4001 个 PR,共包含 26584 个 commits。 -### 不兼容升级 -- 飞桨 API 支持隐式类型提升。在加减乘除等最常用的计算中,如果两个输入的数据类型不一样,就需要确定输出的数据类型问题。飞桨历史上的现状是部分支持且实际规则并不清楚,客观上表现为动静不一致、API 和运算符重载不一致 及 不符合交换率,特别是在大模型广泛使用 bf16/fp16 与 fp32 进行混合计算时容易出现非预期问题且难以定位。飞桨从 3.0 beta 版本开始,明确了[隐式数据类型提升规则](https://www.paddlepaddle.org.cn/documentation/docs/zh/develop/guides/advanced/auto_type_promotion_cn.html),其中详细定义了 Tensor 与 Tensor 和 Tensor 与 1 个数(Scalar)计算结果的类型,保证了计算符合交换律,运算符重载与二元 API 结果一致,动态图与静态图结果一致。更符合用户理解和业界习惯。[#60638](https://github.com/PaddlePaddle/Paddle/pull/60638), [#63842](https://github.com/PaddlePaddle/Paddle/pull/63842), [#60011](https://github.com/PaddlePaddle/Paddle/pull/60011) +除了上述核心特性外,**高扩展中间表示**为了提升飞桨框架的可扩展性,我们研发了高扩展中间表示 PIR(Paddle Intermediate Representation)。这一表示系统性地抽象了底层核心概念,提供了灵活且高效的组件。PIR 作为基础设施,支撑着动转静、自动微分、自动并行、组合算子、图优化等多项技术,并广泛应用于分布式训练、模型压缩、推理部署等场景。通过 PIR 提供的 DRR(Declarative Rewrite Rule)机制,Pass 的开发成本可以降低 60%。同时 PIR 完成在全场景的验证,并默认开启,支持一键动转静,保证了框架卓越的性能表现和良好的拓展性。对框架 2.0 版已有功能的持续改进,同时新特性在使用体验、性能、二次开发便利度以及硬件适配能力等方面带来了显著提升。此版本在用户体验层面持续丰富并增强了满足更多场景的 API 功能,针对大模型场景优化完善了分布式并行策略优化和推理功能增强,在编译安装方面做了比较彻底的易用性改进,对依赖包的安装方式和版本进行了全新同步升级,对系统安全进行了全面加固,对产品文档也进行了全面的纠错检查,同时也对一些废弃代码做了大量的清理以保证架构的简洁性。 -### 废弃功能 -- 支持 0 维 Tensor 已经稳定了 2 个版本,本版本取消了在一些情况下将 0 维 Tensor 转成只含 1 个元素的 1 维 Tensor 的开关`FLAGS_set_to_1d`,这个开关是为了兼容一些套件中用 1 个元素的 1 维 Tensor 表示 0 维 Tensor 的不正确写法。即当前飞桨完全区分 0 维 Tensor 和只含 1 个元素的 1 维 Tensor 的语义,两者不等价。[#61227](https://github.com/PaddlePaddle/Paddle/pull/61227) +## 不兼容升级 + +飞桨 API 支持隐式类型提升。在加减乘除等最常用的计算中,如果两个输入的数据类型不一样,就需要确定输出的数据类型问题。飞桨历史上的现状是部分支持且实际规则并不清楚,客观上表现为动静不一致、API 和运算符重载不一致 及 不符合交换率,特别是在大模型广泛使用 bf16/fp16 与 fp32 进行混合计算时容易出现非预期问题且难以定位。飞桨从 3.0 beta 版本开始,明确了[隐式数据类型提升规则](https://www.paddlepaddle.org.cn/documentation/docs/zh/develop/guides/advanced/auto_type_promotion_cn.html),其中详细定义了 Tensor 与 Tensor 和 Tensor 与 1 个数(Scalar)计算结果的类型,保证了计算符合交换律,运算符重载与二元 API 结果一致,动态图与静态图结果一致。更符合用户理解和业界习惯。https://github.com/PaddlePaddle/Paddle/pull/60638, https://github.com/PaddlePaddle/Paddle/pull/63842, https://github.com/PaddlePaddle/Paddle/pull/60011 + +## 废弃功能 + +支持 0 维 Tensor 已经稳定了 2 个版本,本版本取消了在一些情况下将 0 维 Tensor 转成只含 1 个元素的 1 维 Tensor 的开关 FLAGS_set_to_1d,这个开关是为了兼容一些套件中用 1 个元素的 1 维 Tensor 表示 0 维 Tensor 的不正确写法。即当前飞桨完全区分 0 维 Tensor 和只含 1 个元素的 1 维 Tensor 的语义,两者不等价。https://github.com/PaddlePaddle/Paddle/pull/61227 -### 新增 API 功能 -此版本相比上一个版本新增 126 个 API,API 功能更加丰富,以更好支持大模型、科学计算等需求,包括: -- 新增 Tensor 计算类 API。`paddle.gammaln`, `paddle.gammainc`, `paddle.gammaincc`, `paddle.sinc`, `paddle.pdist`, `paddle.histogramdd`,`paddle.signbit`, `paddle.copysign`, `paddle.bitwise_right_shift/bitwise_left_shift`, `paddle.isposinf/isneginf/isreal`, `paddle.isin`, `paddle.hsplit/dsplit`, `paddle.column_stack/row_stack/dstack/hstack/vstack`, `paddle.slice_scatter`, `paddle.masked_scatter` [#60553](https://github.com/PaddlePaddle/Paddle/pull/60553), [#59311](https://github.com/PaddlePaddle/Paddle/pull/59311), [#59357](https://github.com/PaddlePaddle/Paddle/pull/59357), [#63521](https://github.com/PaddlePaddle/Paddle/pull/63521), [#57869](https://github.com/PaddlePaddle/Paddle/pull/57869), [#57880](https://github.com/PaddlePaddle/Paddle/pull/57880), [#57882](https://github.com/PaddlePaddle/Paddle/pull/57882), [#60150](https://github.com/PaddlePaddle/Paddle/pull/60150), [#57785](https://github.com/PaddlePaddle/Paddle/pull/57785), [#58092](https://github.com/PaddlePaddle/Paddle/pull/58092), [#63523](https://github.com/PaddlePaddle/Paddle/pull/63523), [#64001](https://github.com/PaddlePaddle/Paddle/pull/64001), [#58917](https://github.com/PaddlePaddle/Paddle/pull/58917), [#59127](https://github.com/PaddlePaddle/Paddle/pull/59127), [#59973](https://github.com/PaddlePaddle/Paddle/pull/59973), [#59383](https://github.com/PaddlePaddle/Paddle/pull/59383) -- 新增概率分布类 API。`paddle.distribution.ContinuousBernoulli`, `paddle.distribution.MultivariateNormal`, `paddle.distribution.Exponential`, `paddle.distribution.Gamma`, `paddle.distribution.Binomial`, `paddle.distribution.Poisson` [#58004](https://github.com/PaddlePaddle/Paddle/pull/58004), [#57899](https://github.com/PaddlePaddle/Paddle/pull/57899), [#57856](https://github.com/PaddlePaddle/Paddle/pull/57856) -- 新增优化器类 API。`paddle.optimizer.ASGD`, `paddle.optimizer.NAdam`, `paddle.optimizer.RAdam`, `paddle.optimizer.Rprop` [#58834](https://github.com/PaddlePaddle/Paddle/pull/58834), [#63671](https://github.com/PaddlePaddle/Paddle/pull/63671), [#58851](https://github.com/PaddlePaddle/Paddle/pull/58851) -- 新增线性代数类 API。`paddle.linalg.matrix_exp` [#59715](https://github.com/PaddlePaddle/Paddle/pull/59715) -- 新增其他 API。`paddle.bernoulli_`, `paddle.nn.ZeroPad1D/ZeroPad3D`, `paddle.nn.AdaptiveLogSoftmaxWithLoss`, `paddle.Tensor.apply` [#64252](https://github.com/PaddlePaddle/Paddle/pull/64252), [#59690](https://github.com/PaddlePaddle/Paddle/pull/59690), [#63728](https://github.com/PaddlePaddle/Paddle/pull/63728), [#63302](https://github.com/PaddlePaddle/Paddle/pull/63302), [#59374](https://github.com/PaddlePaddle/Paddle/pull/59374),[#63227](https://github.com/PaddlePaddle/Paddle/pull/63227) +## 1. 用户体验升级 -### 部分 API 功能增强 -- 增强了约 30 个 API 以支持复数计算,如`paddle.log`, `paddle.log1p`, `paddle.square`, `paddle.reciprocal`等,进而扩展对更多科学计算场景的支持。[#62448](https://github.com/PaddlePaddle/Paddle/pull/62448), [#60821](https://github.com/PaddlePaddle/Paddle/pull/60821), [#60897](https://github.com/PaddlePaddle/Paddle/pull/60897), [#62764](https://github.com/PaddlePaddle/Paddle/pull/62764), [#59536](https://github.com/PaddlePaddle/Paddle/pull/59536), [#59529](https://github.com/PaddlePaddle/Paddle/pull/59529), [#63207](https://github.com/PaddlePaddle/Paddle/pull/63207), [#62237](https://github.com/PaddlePaddle/Paddle/pull/62237), [#64684](https://github.com/PaddlePaddle/Paddle/pull/64684) -- 增强了 46 个 API 的功能,使得已有 API 更易用,也更容易进行代码转换。包括但不限于增加 API 参数,扩展 API 支持的数据类型,以及修正原有不合理设计等。[#59890](https://github.com/PaddlePaddle/Paddle/pull/59890), [#63513](https://github.com/PaddlePaddle/Paddle/pull/63513), [#59674](https://github.com/PaddlePaddle/Paddle/pull/59674), [#62778](https://github.com/PaddlePaddle/Paddle/pull/62778), [#64110](https://github.com/PaddlePaddle/Paddle/pull/64110), [#63222](https://github.com/PaddlePaddle/Paddle/pull/63222), [#64331](https://github.com/PaddlePaddle/Paddle/pull/64331), [#64715](https://github.com/PaddlePaddle/Paddle/pull/64715), [#61155](https://github.com/PaddlePaddle/Paddle/pull/61155), [#60070](https://github.com/PaddlePaddle/Paddle/pull/60070), [#61974](https://github.com/PaddlePaddle/Paddle/pull/61974), [#62407](https://github.com/PaddlePaddle/Paddle/pull/62407), [#62672](https://github.com/PaddlePaddle/Paddle/pull/62672),[#62722](https://github.com/PaddlePaddle/Paddle/pull/62722), [#62876](https://github.com/PaddlePaddle/Paddle/pull/62876), [#63284](https://github.com/PaddlePaddle/Paddle/pull/63284), [#63860](https://github.com/PaddlePaddle/Paddle/pull/63860), [#60466](https://github.com/PaddlePaddle/Paddle/pull/60466), [#63690](https://github.com/PaddlePaddle/Paddle/pull/63690), [#63953](https://github.com/PaddlePaddle/Paddle/pull/63953), [#63901](https://github.com/PaddlePaddle/Paddle/pull/63901), [#62624](https://github.com/PaddlePaddle/Paddle/pull/62624), [#59857](https://github.com/PaddlePaddle/Paddle/pull/59857), [#60084](https://github.com/PaddlePaddle/Paddle/pull/60084), [#60766](https://github.com/PaddlePaddle/Paddle/pull/60766), [#62788](https://github.com/PaddlePaddle/Paddle/pull/62788), [#62937](https://github.com/PaddlePaddle/Paddle/pull/62937), [#63134](https://github.com/PaddlePaddle/Paddle/pull/63134), [#62966](https://github.com/PaddlePaddle/Paddle/pull/62966), [#63648](https://github.com/PaddlePaddle/Paddle/pull/63648), [#63881](https://github.com/PaddlePaddle/Paddle/pull/63881), [#64358](https://github.com/PaddlePaddle/Paddle/pull/64358), [#60503](https://github.com/PaddlePaddle/Paddle/pull/60503), [#63604](https://github.com/PaddlePaddle/Paddle/pull/63604), [#62338](https://github.com/PaddlePaddle/Paddle/pull/62338) -- 增强了高阶微分的单测基础设施,能够更容易地添加高阶微分的单测用例。[#62074](https://github.com/PaddlePaddle/Paddle/pull/62074) +### 新特性 -### API 性能提升 -- 对 Tensor 基础索引、高级索引和联合索引的性能进行了集中优化,在 GPU 上的计算性能较此前提升 2 到 31 倍,CPU 上提升 1.8 到 1004 倍。[#60254](https://github.com/PaddlePaddle/Paddle/pull/60254), [#60276](https://github.com/PaddlePaddle/Paddle/pull/60276), [#60452](https://github.com/PaddlePaddle/Paddle/pull/60452), [#60771](https://github.com/PaddlePaddle/Paddle/pull/60771), [#61021](https://github.com/PaddlePaddle/Paddle/pull/61021), [#60983](https://github.com/PaddlePaddle/Paddle/pull/60983), [#61060](https://github.com/PaddlePaddle/Paddle/pull/61060), [#60618](https://github.com/PaddlePaddle/Paddle/pull/60618) +- 新增飞桨 API,扩展飞桨功能。包括 `paddle.nn.FeatureAlphaDropout`, `paddle.cartesian_prod`, `paddle.distributed.to_distributed`, `paddle.pi` 等。[#64881](https://github.com/PaddlePaddle/Paddle/pull/64881), [#65605](https://github.com/PaddlePaddle/Paddle/pull/65605), [#70757](https://github.com/PaddlePaddle/Paddle/pull/70757), [#71030](https://github.com/PaddlePaddle/Paddle/pull/71030), [#69946](https://github.com/PaddlePaddle/Paddle/pull/69946), [#70021](https://github.com/PaddlePaddle/Paddle/pull/70021), [#69613](https://github.com/PaddlePaddle/Paddle/pull/69613), [#68123](https://github.com/PaddlePaddle/Paddle/pull/68123), [#70032](https://github.com/PaddlePaddle/Paddle/pull/70032) +- 新增 Tensor 类方法和属性,及新增相关单测,使得 Tensor 更易用。[#68334](https://github.com/PaddlePaddle/Paddle/pull/68334), [#68681](https://github.com/PaddlePaddle/Paddle/pull/68681), [#69132](https://github.com/PaddlePaddle/Paddle/pull/69132), [#69270](https://github.com/PaddlePaddle/Paddle/pull/69270), [#69256](https://github.com/PaddlePaddle/Paddle/pull/69256), [#69197](https://github.com/PaddlePaddle/Paddle/pull/69197), [#69231](https://github.com/PaddlePaddle/Paddle/pull/69231), [#69222](https://github.com/PaddlePaddle/Paddle/pull/69222), [#69257](https://github.com/PaddlePaddle/Paddle/pull/69257), [#69301](https://github.com/PaddlePaddle/Paddle/pull/69301), [#69361](https://github.com/PaddlePaddle/Paddle/pull/69361), [#69348](https://github.com/PaddlePaddle/Paddle/pull/69348), [#69464](https://github.com/PaddlePaddle/Paddle/pull/69464), [#69542](https://github.com/PaddlePaddle/Paddle/pull/69542), [#69667](https://github.com/PaddlePaddle/Paddle/pull/69667), [#69563](https://github.com/PaddlePaddle/Paddle/pull/69563), [#69796](https://github.com/PaddlePaddle/Paddle/pull/69796), [#69477](https://github.com/PaddlePaddle/Paddle/pull/69477), [#69779](https://github.com/PaddlePaddle/Paddle/pull/69779), [#69724](https://github.com/PaddlePaddle/Paddle/pull/69724), [#69835](https://github.com/PaddlePaddle/Paddle/pull/69835), [#69781](https://github.com/PaddlePaddle/Paddle/pull/69781), [#69982](https://github.com/PaddlePaddle/Paddle/pull/69982), [#69913](https://github.com/PaddlePaddle/Paddle/pull/69913), [#70026](https://github.com/PaddlePaddle/Paddle/pull/70026), [#70013](https://github.com/PaddlePaddle/Paddle/pull/70013), [#69539](https://github.com/PaddlePaddle/Paddle/pull/69539), [#69736](https://github.com/PaddlePaddle/Paddle/pull/69736), [#69841](https://github.com/PaddlePaddle/Paddle/pull/69841), [#70277](https://github.com/PaddlePaddle/Paddle/pull/70277), [#69580](https://github.com/PaddlePaddle/Paddle/pull/69580), [#69599](https://github.com/PaddlePaddle/Paddle/pull/69599), [#69693](https://github.com/PaddlePaddle/Paddle/pull/69693), [#69848](https://github.com/PaddlePaddle/Paddle/pull/69848), [#69751](https://github.com/PaddlePaddle/Paddle/pull/69751), [#70556](https://github.com/PaddlePaddle/Paddle/pull/70556), [#70591](https://github.com/PaddlePaddle/Paddle/pull/70591), [#69673](https://github.com/PaddlePaddle/Paddle/pull/69673), [#70647](https://github.com/PaddlePaddle/Paddle/pull/70647), [#68192](https://github.com/PaddlePaddle/Paddle/pull/68192), [#68511](https://github.com/PaddlePaddle/Paddle/pull/68511), [#68833](https://github.com/PaddlePaddle/Paddle/pull/68833), [#69406](https://github.com/PaddlePaddle/Paddle/pull/69406), [#69480](https://github.com/PaddlePaddle/Paddle/pull/69480), [#69463](https://github.com/PaddlePaddle/Paddle/pull/69463), [#69632](https://github.com/PaddlePaddle/Paddle/pull/69632), [#69473](https://github.com/PaddlePaddle/Paddle/pull/69473), [#68694](https://github.com/PaddlePaddle/Paddle/pull/68694), [#69534](https://github.com/PaddlePaddle/Paddle/pull/69534), [#69820](https://github.com/PaddlePaddle/Paddle/pull/69820), [#70121](https://github.com/PaddlePaddle/Paddle/pull/70121) + +### API 功能增强 + +- 增强了 43 个 API 的功能,使得已有 API 更易用,也更容易进行代码转换。包括但不限于增加 API 参数,扩展 API 支持的数据类型,以及修正原有不合理设计等。[#65105](https://github.com/PaddlePaddle/Paddle/pull/65105), [#65103](https://github.com/PaddlePaddle/Paddle/pull/65103), [#62975](https://github.com/PaddlePaddle/Paddle/pull/62975), [#64436](https://github.com/PaddlePaddle/Paddle/pull/64436), [#63346](https://github.com/PaddlePaddle/Paddle/pull/63346), [#68079](https://github.com/PaddlePaddle/Paddle/pull/68079), [#67878](https://github.com/PaddlePaddle/Paddle/pull/67878), [#68432](https://github.com/PaddlePaddle/Paddle/pull/68432), [#68677](https://github.com/PaddlePaddle/Paddle/pull/68677), [#69012](https://github.com/PaddlePaddle/Paddle/pull/69012), [#69385](https://github.com/PaddlePaddle/Paddle/pull/69385), [#65032](https://github.com/PaddlePaddle/Paddle/pull/65032), [#64977](https://github.com/PaddlePaddle/Paddle/pull/64977), [#67071](https://github.com/PaddlePaddle/Paddle/pull/67071), [#67298](https://github.com/PaddlePaddle/Paddle/pull/67298), [#66687](https://github.com/PaddlePaddle/Paddle/pull/66687), [#65946](https://github.com/PaddlePaddle/Paddle/pull/65946), [#66170](https://github.com/PaddlePaddle/Paddle/pull/66170), [#66929](https://github.com/PaddlePaddle/Paddle/pull/66929), [#67994](https://github.com/PaddlePaddle/Paddle/pull/67994), [#67947](https://github.com/PaddlePaddle/Paddle/pull/67947), [#68033](https://github.com/PaddlePaddle/Paddle/pull/68033), [#68046](https://github.com/PaddlePaddle/Paddle/pull/68046), [#68294](https://github.com/PaddlePaddle/Paddle/pull/68294), [#68214](https://github.com/PaddlePaddle/Paddle/pull/68214), [#68281](https://github.com/PaddlePaddle/Paddle/pull/68281), [#68390](https://github.com/PaddlePaddle/Paddle/pull/68390), [#68772](https://github.com/PaddlePaddle/Paddle/pull/68772), [#69451](https://github.com/PaddlePaddle/Paddle/pull/69451), [#69252](https://github.com/PaddlePaddle/Paddle/pull/69252), [#69529](https://github.com/PaddlePaddle/Paddle/pull/69529), [#69750](https://github.com/PaddlePaddle/Paddle/pull/69750), [#69827](https://github.com/PaddlePaddle/Paddle/pull/69827), [#69099](https://github.com/PaddlePaddle/Paddle/pull/69099), [#68594](https://github.com/PaddlePaddle/Paddle/pull/68594), [#70090](https://github.com/PaddlePaddle/Paddle/pull/70090), [#70228](https://github.com/PaddlePaddle/Paddle/pull/70228), [#70166](https://github.com/PaddlePaddle/Paddle/pull/70166), [#70389](https://github.com/PaddlePaddle/Paddle/pull/70389), [#70790](https://github.com/PaddlePaddle/Paddle/pull/70790), [#71029](https://github.com/PaddlePaddle/Paddle/pull/71029), [#71283](https://github.com/PaddlePaddle/Paddle/pull/71283), [#71342](https://github.com/PaddlePaddle/Paddle/pull/71342) +- 飞桨 Python API 全面支持类型提示。所有 Python API 的参数和返回值都添加了类型提示,以便于开发和使用。[#65209](https://github.com/PaddlePaddle/Paddle/pull/65209), [#65201](https://github.com/PaddlePaddle/Paddle/pull/65201), [#65190](https://github.com/PaddlePaddle/Paddle/pull/65190), [#65082](https://github.com/PaddlePaddle/Paddle/pull/65082), [#65226](https://github.com/PaddlePaddle/Paddle/pull/65226), [#65076](https://github.com/PaddlePaddle/Paddle/pull/65076), [#65238](https://github.com/PaddlePaddle/Paddle/pull/65238), [#65236](https://github.com/PaddlePaddle/Paddle/pull/65236), [#65247](https://github.com/PaddlePaddle/Paddle/pull/65247), [#65249](https://github.com/PaddlePaddle/Paddle/pull/65249), [#65244](https://github.com/PaddlePaddle/Paddle/pull/65244), [#65272](https://github.com/PaddlePaddle/Paddle/pull/65272), [#65191](https://github.com/PaddlePaddle/Paddle/pull/65191), [#65290](https://github.com/PaddlePaddle/Paddle/pull/65290), [#65255](https://github.com/PaddlePaddle/Paddle/pull/65255), [#65292](https://github.com/PaddlePaddle/Paddle/pull/65292), [#65300](https://github.com/PaddlePaddle/Paddle/pull/65300), [#65301](https://github.com/PaddlePaddle/Paddle/pull/65301), [#65332](https://github.com/PaddlePaddle/Paddle/pull/65332), [#65323](https://github.com/PaddlePaddle/Paddle/pull/65323), [#65326](https://github.com/PaddlePaddle/Paddle/pull/65326), [#65273](https://github.com/PaddlePaddle/Paddle/pull/65273), [#65317](https://github.com/PaddlePaddle/Paddle/pull/65317), [#65354](https://github.com/PaddlePaddle/Paddle/pull/65354), [#65283](https://github.com/PaddlePaddle/Paddle/pull/65283), [#65372](https://github.com/PaddlePaddle/Paddle/pull/65372), [#65337](https://github.com/PaddlePaddle/Paddle/pull/65337), [#65085](https://github.com/PaddlePaddle/Paddle/pull/65085), [#65382](https://github.com/PaddlePaddle/Paddle/pull/65382), [#65381](https://github.com/PaddlePaddle/Paddle/pull/65381), [#65378](https://github.com/PaddlePaddle/Paddle/pull/65378), [#65274](https://github.com/PaddlePaddle/Paddle/pull/65274), [#65380](https://github.com/PaddlePaddle/Paddle/pull/65380), [#65386](https://github.com/PaddlePaddle/Paddle/pull/65386), [#65351](https://github.com/PaddlePaddle/Paddle/pull/65351), [#65284](https://github.com/PaddlePaddle/Paddle/pull/65284), [#65366](https://github.com/PaddlePaddle/Paddle/pull/65366), [#65308](https://github.com/PaddlePaddle/Paddle/pull/65308), [#65375](https://github.com/PaddlePaddle/Paddle/pull/65375), [#65376](https://github.com/PaddlePaddle/Paddle/pull/65376), [#65464](https://github.com/PaddlePaddle/Paddle/pull/65464), [#65197](https://github.com/PaddlePaddle/Paddle/pull/65197), [#65455](https://github.com/PaddlePaddle/Paddle/pull/65455), [#65457](https://github.com/PaddlePaddle/Paddle/pull/65457), [#65487](https://github.com/PaddlePaddle/Paddle/pull/65487), [#65486](https://github.com/PaddlePaddle/Paddle/pull/65486), [#65547](https://github.com/PaddlePaddle/Paddle/pull/65547), [#65504](https://github.com/PaddlePaddle/Paddle/pull/65504), [#65460](https://github.com/PaddlePaddle/Paddle/pull/65460), [#65183](https://github.com/PaddlePaddle/Paddle/pull/65183), [#65454](https://github.com/PaddlePaddle/Paddle/pull/65454), [#65559](https://github.com/PaddlePaddle/Paddle/pull/65559), [#65560](https://github.com/PaddlePaddle/Paddle/pull/65560), [#65570](https://github.com/PaddlePaddle/Paddle/pull/65570), [#65569](https://github.com/PaddlePaddle/Paddle/pull/65569), [#65566](https://github.com/PaddlePaddle/Paddle/pull/65566), [#65620](https://github.com/PaddlePaddle/Paddle/pull/65620), [#65568](https://github.com/PaddlePaddle/Paddle/pull/65568), [#65567](https://github.com/PaddlePaddle/Paddle/pull/65567), [#65660](https://github.com/PaddlePaddle/Paddle/pull/65660), [#65645](https://github.com/PaddlePaddle/Paddle/pull/65645), [#65600](https://github.com/PaddlePaddle/Paddle/pull/65600), [#65532](https://github.com/PaddlePaddle/Paddle/pull/65532), [#65765](https://github.com/PaddlePaddle/Paddle/pull/65765), [#65767](https://github.com/PaddlePaddle/Paddle/pull/65767), [#65770](https://github.com/PaddlePaddle/Paddle/pull/65770), [#65768](https://github.com/PaddlePaddle/Paddle/pull/65768), [#65771](https://github.com/PaddlePaddle/Paddle/pull/65771), [#65772](https://github.com/PaddlePaddle/Paddle/pull/65772), [#65774](https://github.com/PaddlePaddle/Paddle/pull/65774), [#65769](https://github.com/PaddlePaddle/Paddle/pull/65769), [#65773](https://github.com/PaddlePaddle/Paddle/pull/65773), [#65766](https://github.com/PaddlePaddle/Paddle/pull/65766), [#65776](https://github.com/PaddlePaddle/Paddle/pull/65776), [#65775](https://github.com/PaddlePaddle/Paddle/pull/65775), [#65755](https://github.com/PaddlePaddle/Paddle/pull/65755), [#65779](https://github.com/PaddlePaddle/Paddle/pull/65779), [#65777](https://github.com/PaddlePaddle/Paddle/pull/65777), [#65823](https://github.com/PaddlePaddle/Paddle/pull/65823), [#65807](https://github.com/PaddlePaddle/Paddle/pull/65807), [#65821](https://github.com/PaddlePaddle/Paddle/pull/65821), [#65819](https://github.com/PaddlePaddle/Paddle/pull/65819), [#65810](https://github.com/PaddlePaddle/Paddle/pull/65810), [#65808](https://github.com/PaddlePaddle/Paddle/pull/65808), [#65824](https://github.com/PaddlePaddle/Paddle/pull/65824), [#65553](https://github.com/PaddlePaddle/Paddle/pull/65553), [#65818](https://github.com/PaddlePaddle/Paddle/pull/65818), [#65812](https://github.com/PaddlePaddle/Paddle/pull/65812), [#65803](https://github.com/PaddlePaddle/Paddle/pull/65803), [#65865](https://github.com/PaddlePaddle/Paddle/pull/65865), [#65870](https://github.com/PaddlePaddle/Paddle/pull/65870), [#65866](https://github.com/PaddlePaddle/Paddle/pull/65866), [#65844](https://github.com/PaddlePaddle/Paddle/pull/65844), [#65845](https://github.com/PaddlePaddle/Paddle/pull/65845), [#65853](https://github.com/PaddlePaddle/Paddle/pull/65853), [#65874](https://github.com/PaddlePaddle/Paddle/pull/65874), [#65871](https://github.com/PaddlePaddle/Paddle/pull/65871), [#65809](https://github.com/PaddlePaddle/Paddle/pull/65809), [#65867](https://github.com/PaddlePaddle/Paddle/pull/65867), [#65822](https://github.com/PaddlePaddle/Paddle/pull/65822), [#65872](https://github.com/PaddlePaddle/Paddle/pull/65872), [#65873](https://github.com/PaddlePaddle/Paddle/pull/65873), [#65869](https://github.com/PaddlePaddle/Paddle/pull/65869), [#65868](https://github.com/PaddlePaddle/Paddle/pull/65868), [#65849](https://github.com/PaddlePaddle/Paddle/pull/65849), [#65875](https://github.com/PaddlePaddle/Paddle/pull/65875), [#65876](https://github.com/PaddlePaddle/Paddle/pull/65876), [#65843](https://github.com/PaddlePaddle/Paddle/pull/65843), [#65727](https://github.com/PaddlePaddle/Paddle/pull/65727), [#65587](https://github.com/PaddlePaddle/Paddle/pull/65587), [#66006](https://github.com/PaddlePaddle/Paddle/pull/66006), [#66005](https://github.com/PaddlePaddle/Paddle/pull/66005), [#65785](https://github.com/PaddlePaddle/Paddle/pull/65785), [#65784](https://github.com/PaddlePaddle/Paddle/pull/65784), [#65811](https://github.com/PaddlePaddle/Paddle/pull/65811), [#65919](https://github.com/PaddlePaddle/Paddle/pull/65919), [#65838](https://github.com/PaddlePaddle/Paddle/pull/65838), [#65852](https://github.com/PaddlePaddle/Paddle/pull/65852), [#65847](https://github.com/PaddlePaddle/Paddle/pull/65847), [#66014](https://github.com/PaddlePaddle/Paddle/pull/66014), [#65805](https://github.com/PaddlePaddle/Paddle/pull/65805), [#66009](https://github.com/PaddlePaddle/Paddle/pull/66009), [#66012](https://github.com/PaddlePaddle/Paddle/pull/66012), [#65633](https://github.com/PaddlePaddle/Paddle/pull/65633), [#66011](https://github.com/PaddlePaddle/Paddle/pull/66011), [#66010](https://github.com/PaddlePaddle/Paddle/pull/66010), [#66013](https://github.com/PaddlePaddle/Paddle/pull/66013), [#66015](https://github.com/PaddlePaddle/Paddle/pull/66015), [#66016](https://github.com/PaddlePaddle/Paddle/pull/66016), [#66030](https://github.com/PaddlePaddle/Paddle/pull/66030), [#66028](https://github.com/PaddlePaddle/Paddle/pull/66028), [#66029](https://github.com/PaddlePaddle/Paddle/pull/66029), [#66054](https://github.com/PaddlePaddle/Paddle/pull/66054), [#66040](https://github.com/PaddlePaddle/Paddle/pull/66040), [#65993](https://github.com/PaddlePaddle/Paddle/pull/65993), [#66058](https://github.com/PaddlePaddle/Paddle/pull/66058), [#66280](https://github.com/PaddlePaddle/Paddle/pull/66280), [#66037](https://github.com/PaddlePaddle/Paddle/pull/66037), [#66057](https://github.com/PaddlePaddle/Paddle/pull/66057), [#66077](https://github.com/PaddlePaddle/Paddle/pull/66077), [#66051](https://github.com/PaddlePaddle/Paddle/pull/66051), [#65912](https://github.com/PaddlePaddle/Paddle/pull/65912), [#66090](https://github.com/PaddlePaddle/Paddle/pull/66090), [#66189](https://github.com/PaddlePaddle/Paddle/pull/66189), [#66127](https://github.com/PaddlePaddle/Paddle/pull/66127), [#66277](https://github.com/PaddlePaddle/Paddle/pull/66277), [#66119](https://github.com/PaddlePaddle/Paddle/pull/66119), [#66270](https://github.com/PaddlePaddle/Paddle/pull/66270), [#66305](https://github.com/PaddlePaddle/Paddle/pull/66305), [#66306](https://github.com/PaddlePaddle/Paddle/pull/66306), [#66279](https://github.com/PaddlePaddle/Paddle/pull/66279), [#66276](https://github.com/PaddlePaddle/Paddle/pull/66276), [#66295](https://github.com/PaddlePaddle/Paddle/pull/66295), [#66301](https://github.com/PaddlePaddle/Paddle/pull/66301), [#66473](https://github.com/PaddlePaddle/Paddle/pull/66473), [#66384](https://github.com/PaddlePaddle/Paddle/pull/66384), [#66505](https://github.com/PaddlePaddle/Paddle/pull/66505), [#66328](https://github.com/PaddlePaddle/Paddle/pull/66328), [#66394](https://github.com/PaddlePaddle/Paddle/pull/66394), [#66392](https://github.com/PaddlePaddle/Paddle/pull/66392), [#66432](https://github.com/PaddlePaddle/Paddle/pull/66432), [#66575](https://github.com/PaddlePaddle/Paddle/pull/66575), [#66572](https://github.com/PaddlePaddle/Paddle/pull/66572), [#66656](https://github.com/PaddlePaddle/Paddle/pull/66656), [#66475](https://github.com/PaddlePaddle/Paddle/pull/66475), [#66654](https://github.com/PaddlePaddle/Paddle/pull/66654), [#66616](https://github.com/PaddlePaddle/Paddle/pull/66616), [#66694](https://github.com/PaddlePaddle/Paddle/pull/66694), [#66686](https://github.com/PaddlePaddle/Paddle/pull/66686), [#66766](https://github.com/PaddlePaddle/Paddle/pull/66766), [#66749](https://github.com/PaddlePaddle/Paddle/pull/66749), [#66760](https://github.com/PaddlePaddle/Paddle/pull/66760), [#66803](https://github.com/PaddlePaddle/Paddle/pull/66803), [#66770](https://github.com/PaddlePaddle/Paddle/pull/66770), [#66693](https://github.com/PaddlePaddle/Paddle/pull/66693), [#66771](https://github.com/PaddlePaddle/Paddle/pull/66771), [#66792](https://github.com/PaddlePaddle/Paddle/pull/66792), [#66862](https://github.com/PaddlePaddle/Paddle/pull/66862), [#66867](https://github.com/PaddlePaddle/Paddle/pull/66867), [#66684](https://github.com/PaddlePaddle/Paddle/pull/66684), [#66966](https://github.com/PaddlePaddle/Paddle/pull/66966), [#66793](https://github.com/PaddlePaddle/Paddle/pull/66793), [#66987](https://github.com/PaddlePaddle/Paddle/pull/66987), [#66985](https://github.com/PaddlePaddle/Paddle/pull/66985), [#66989](https://github.com/PaddlePaddle/Paddle/pull/66989), [#66639](https://github.com/PaddlePaddle/Paddle/pull/66639), [#66994](https://github.com/PaddlePaddle/Paddle/pull/66994), [#66986](https://github.com/PaddlePaddle/Paddle/pull/66986), [#66993](https://github.com/PaddlePaddle/Paddle/pull/66993), [#67002](https://github.com/PaddlePaddle/Paddle/pull/67002), [#66996](https://github.com/PaddlePaddle/Paddle/pull/66996), [#67001](https://github.com/PaddlePaddle/Paddle/pull/67001), [#66864](https://github.com/PaddlePaddle/Paddle/pull/66864), [#67031](https://github.com/PaddlePaddle/Paddle/pull/67031), [#67089](https://github.com/PaddlePaddle/Paddle/pull/67089), [#67143](https://github.com/PaddlePaddle/Paddle/pull/67143), [#67179](https://github.com/PaddlePaddle/Paddle/pull/67179), [#67178](https://github.com/PaddlePaddle/Paddle/pull/67178), [#67284](https://github.com/PaddlePaddle/Paddle/pull/67284), [#67104](https://github.com/PaddlePaddle/Paddle/pull/67104), [#67079](https://github.com/PaddlePaddle/Paddle/pull/67079), [#67132](https://github.com/PaddlePaddle/Paddle/pull/67132), [#67147](https://github.com/PaddlePaddle/Paddle/pull/67147), [#67204](https://github.com/PaddlePaddle/Paddle/pull/67204), [#67112](https://github.com/PaddlePaddle/Paddle/pull/67112), [#67233](https://github.com/PaddlePaddle/Paddle/pull/67233), [#67366](https://github.com/PaddlePaddle/Paddle/pull/67366), [#67067](https://github.com/PaddlePaddle/Paddle/pull/67067), [#67391](https://github.com/PaddlePaddle/Paddle/pull/67391), [#67428](https://github.com/PaddlePaddle/Paddle/pull/67428), [#67197](https://github.com/PaddlePaddle/Paddle/pull/67197), [#67047](https://github.com/PaddlePaddle/Paddle/pull/67047), [#66890](https://github.com/PaddlePaddle/Paddle/pull/66890), [#67159](https://github.com/PaddlePaddle/Paddle/pull/67159), [#67439](https://github.com/PaddlePaddle/Paddle/pull/67439), [#67555](https://github.com/PaddlePaddle/Paddle/pull/67555), [#67448](https://github.com/PaddlePaddle/Paddle/pull/67448), [#67556](https://github.com/PaddlePaddle/Paddle/pull/67556), [#67469](https://github.com/PaddlePaddle/Paddle/pull/67469), [#67558](https://github.com/PaddlePaddle/Paddle/pull/67558), [#67405](https://github.com/PaddlePaddle/Paddle/pull/67405), [#67644](https://github.com/PaddlePaddle/Paddle/pull/67644), [#67624](https://github.com/PaddlePaddle/Paddle/pull/67624), [#67679](https://github.com/PaddlePaddle/Paddle/pull/67679), [#67677](https://github.com/PaddlePaddle/Paddle/pull/67677), [#67785](https://github.com/PaddlePaddle/Paddle/pull/67785), [#67767](https://github.com/PaddlePaddle/Paddle/pull/67767), [#65319](https://github.com/PaddlePaddle/Paddle/pull/65319), [#65277](https://github.com/PaddlePaddle/Paddle/pull/65277), [#67673](https://github.com/PaddlePaddle/Paddle/pull/67673), [#65557](https://github.com/PaddlePaddle/Paddle/pull/65557), [#67527](https://github.com/PaddlePaddle/Paddle/pull/67527), [#66965](https://github.com/PaddlePaddle/Paddle/pull/66965), [#65905](https://github.com/PaddlePaddle/Paddle/pull/65905), [#65657](https://github.com/PaddlePaddle/Paddle/pull/65657), [#66357](https://github.com/PaddlePaddle/Paddle/pull/66357), [#68163](https://github.com/PaddlePaddle/Paddle/pull/68163) +- 优化了较多飞桨 API 的报错信息,使得报错更易懂。[#67148](https://github.com/PaddlePaddle/Paddle/pull/67148), [#67154](https://github.com/PaddlePaddle/Paddle/pull/67154), [#67546](https://github.com/PaddlePaddle/Paddle/pull/67546), [#67335](https://github.com/PaddlePaddle/Paddle/pull/67335), [#67255](https://github.com/PaddlePaddle/Paddle/pull/67255), [#67099](https://github.com/PaddlePaddle/Paddle/pull/67099), [#67074](https://github.com/PaddlePaddle/Paddle/pull/67074), [#67073](https://github.com/PaddlePaddle/Paddle/pull/67073), [#66957](https://github.com/PaddlePaddle/Paddle/pull/66957), [#67063](https://github.com/PaddlePaddle/Paddle/pull/67063), [#67575](https://github.com/PaddlePaddle/Paddle/pull/67575), [#67608](https://github.com/PaddlePaddle/Paddle/pull/67608), [#67634](https://github.com/PaddlePaddle/Paddle/pull/67634), [#67325](https://github.com/PaddlePaddle/Paddle/pull/67325), [#67429](https://github.com/PaddlePaddle/Paddle/pull/67429), [#67401](https://github.com/PaddlePaddle/Paddle/pull/67401), [#66881](https://github.com/PaddlePaddle/Paddle/pull/66881), [#68492](https://github.com/PaddlePaddle/Paddle/pull/68492), [#67695](https://github.com/PaddlePaddle/Paddle/pull/67695), [#69833](https://github.com/PaddlePaddle/Paddle/pull/69833), [#70398](https://github.com/PaddlePaddle/Paddle/pull/70398) ### Bug 修复 -- 修复 `paddle.optimizer.LBFGS` 中使用非 Tensor 进行计算导致的报错。 [#60219](https://github.com/PaddlePaddle/Paddle/pull/60219) -- 修复 `paddle.optimizer.LBFGS` 中随机数不能固定的问题。 [#60591](https://github.com/PaddlePaddle/Paddle/pull/60591) -- 修复 `set_value` 算子梯度计算不正确的问题。 [#59034](https://github.com/PaddlePaddle/Paddle/pull/59034) -- 修复 Tensor 基础索引适配 PIR 的问题。 [#60259](https://github.com/PaddlePaddle/Paddle/pull/60259), [#61103](https://github.com/PaddlePaddle/Paddle/pull/61103) -- 修复 Tensor 联合索引赋值时的问题。[#60376](https://github.com/PaddlePaddle/Paddle/issues/60376), [#60447](https://github.com/PaddlePaddle/Paddle/pull/60447) -- 修复 Tensor 联合索引取值时的问题。 [#61922](https://github.com/PaddlePaddle/Paddle/pull/61922) -- 修复 `paddle.flatten` stride 计算错误问题,并能够新增`paddle.flatten_` 。[#63084](https://github.com/PaddlePaddle/Paddle/pull/63084) -- 修复 `paddle.index_fill` 和 `paddle.index_fill_` 结果不一致问题。 [#59863](https://github.com/PaddlePaddle/Paddle/pull/59863) -- 修复 `paddle.masked_scatter`报错问题。 [#60835](https://github.com/PaddlePaddle/Paddle/pull/60835) -- 修复 `paddle.histogramdd` cpu 报错问题。 [#61891](https://github.com/PaddlePaddle/Paddle/pull/61891) -- 修复 `paddle.cast_` 在 cpu 上连续使用导致结果错误的 bug。 [#60054](https://github.com/PaddlePaddle/Paddle/pull/60054) -- 修复 `paddle.put_along_axis` 在输入 size 很大的时候存在 bug 的问题。 [#60551](https://github.com/PaddlePaddle/Paddle/pull/60551) -- 修复 `paddle.nanmedian` cpu 报错问题。 [#63221](https://github.com/PaddlePaddle/Paddle/pull/63221) -- 修复 `paddle.median` 在 min 分支下不支持输入为除浮点类型以外的类型。 [#64444](https://github.com/PaddlePaddle/Paddle/pull/64444) -- 修复 分布式场景中的 dataloader 问题。 [#62696](https://github.com/PaddlePaddle/Paddle/pull/62696), [#63378](https://github.com/PaddlePaddle/Paddle/pull/63378) -- 修复 error 提示的格式问题。 [#63106](https://github.com/PaddlePaddle/Paddle/pull/63106), [#63144](https://github.com/PaddlePaddle/Paddle/pull/63144) -- 修复 GLOG_v>=6 下格式问题。 [#63345](https://github.com/PaddlePaddle/Paddle/pull/63345) - -### 安全改善 -- 增强对 parent_ids 的检查。 [#62826](https://github.com/PaddlePaddle/Paddle/pull/62826) - -## 2.基础执行架构 - -PIR 基础功能全面升级完善,成熟度大幅提升,基于 PIR 使飞桨基础架构设计更合理、保证了框架卓越的性能表现和良好的拓展性。在此版本中,完成了 PIR 多场景的推全验证:单机场景完成动转静场景 PIR 后端切换;推理场景完成全部存量模型验证,并在 84.2%模型有 10%+收益;完成分布式场景基于 PIR 的验证。同时基于 PIR 完成控制流、backward 逻辑、save/load、OneDNN 适配等核心模块的开发验证,为飞桨 PIR 切换为默认模式,奠定了坚实的基础。对飞桨框架算子体系的功能完备性、执行效率和稳定性进一步提升,给开发者带来更好的使用和开发体验。 -### 功能优化 -- 完善 PIR 的基础功能,包含基础的类型系统增强、调试、打印、Pass 开发、AMP 支持等,提升 PIR 的研发效率。[#60723](https://github.com/PaddlePaddle/Paddle/pull/60723), [#60677](https://github.com/PaddlePaddle/Paddle/pull/60677), [#60783](https://github.com/PaddlePaddle/Paddle/pull/60783), [#60798](https://github.com/PaddlePaddle/Paddle/pull/60798), [#61053](https://github.com/PaddlePaddle/Paddle/pull/61053), [#61366](https://github.com/PaddlePaddle/Paddle/pull/61366), [#61446](https://github.com/PaddlePaddle/Paddle/pull/61446), [#60024](https://github.com/PaddlePaddle/Paddle/pull/60024), [#59939](https://github.com/PaddlePaddle/Paddle/pull/59939), [#63376](https://github.com/PaddlePaddle/Paddle/pull/63376), [#61853](https://github.com/PaddlePaddle/Paddle/pull/61853), [#63914](https://github.com/PaddlePaddle/Paddle/pull/63914), [#60170](https://github.com/PaddlePaddle/Paddle/pull/60170), [#60678](https://github.com/PaddlePaddle/Paddle/pull/60678), [#64093](https://github.com/PaddlePaddle/Paddle/pull/64093), [#64065](https://github.com/PaddlePaddle/Paddle/pull/64065), [#62451](https://github.com/PaddlePaddle/Paddle/pull/62451), [#59784](https://github.com/PaddlePaddle/Paddle/pull/59784), [#60136](https://github.com/PaddlePaddle/Paddle/pull/60136), [#63336](https://github.com/PaddlePaddle/Paddle/pull/63336), [#62108](https://github.com/PaddlePaddle/Paddle/pull/62108), [#60860](https://github.com/PaddlePaddle/Paddle/pull/60860), [#60536](https://github.com/PaddlePaddle/Paddle/pull/60536), [#60590](https://github.com/PaddlePaddle/Paddle/pull/60590), [#60752](https://github.com/PaddlePaddle/Paddle/pull/60752), [#61435](https://github.com/PaddlePaddle/Paddle/pull/61435), [#62977](https://github.com/PaddlePaddle/Paddle/pull/62977), [#62139](https://github.com/PaddlePaddle/Paddle/pull/62139), [#60432](https://github.com/PaddlePaddle/Paddle/pull/60432), [#61452](https://github.com/PaddlePaddle/Paddle/pull/61452), [#61978](https://github.com/PaddlePaddle/Paddle/pull/61978), [#62262](https://github.com/PaddlePaddle/Paddle/pull/62262), [#62422](https://github.com/PaddlePaddle/Paddle/pull/62422), [#60359](https://github.com/PaddlePaddle/Paddle/pull/60359), [#62989](https://github.com/PaddlePaddle/Paddle/pull/62989), [#61297](https://github.com/PaddlePaddle/Paddle/pull/61297), [#61399](https://github.com/PaddlePaddle/Paddle/pull/61399), [#61871](https://github.com/PaddlePaddle/Paddle/pull/61871), [#61496](https://github.com/PaddlePaddle/Paddle/pull/61496), [#62413](https://github.com/PaddlePaddle/Paddle/pull/62413) -- 优化飞桨执行器执行逻辑,完善 Pass 体系,提升训推性能表现,并更好的支持分布式并行的逻辑运行。 [#60182](https://github.com/PaddlePaddle/Paddle/pull/60182), [#60516](https://github.com/PaddlePaddle/Paddle/pull/60516), [#63573](https://github.com/PaddlePaddle/Paddle/pull/63573), [#60181](https://github.com/PaddlePaddle/Paddle/pull/60181), [#59792](https://github.com/PaddlePaddle/Paddle/pull/59792), [#62025](https://github.com/PaddlePaddle/Paddle/pull/62025), [#61160](https://github.com/PaddlePaddle/Paddle/pull/61160), [#61188](https://github.com/PaddlePaddle/Paddle/pull/61188), [#61277](https://github.com/PaddlePaddle/Paddle/pull/61277), [#61669](https://github.com/PaddlePaddle/Paddle/pull/61669), [#60823](https://github.com/PaddlePaddle/Paddle/pull/60823), [#61310](https://github.com/PaddlePaddle/Paddle/pull/61310), [#60892](https://github.com/PaddlePaddle/Paddle/pull/60892), [#60578](https://github.com/PaddlePaddle/Paddle/pull/60578), [#61657](https://github.com/PaddlePaddle/Paddle/pull/61657), [#62638](https://github.com/PaddlePaddle/Paddle/pull/62638), [#63960](https://github.com/PaddlePaddle/Paddle/pull/63960), [#64234](https://github.com/PaddlePaddle/Paddle/pull/64234) - -### PIR 新功能 -- 基于 PIR 实现反向逻辑,直接生成反向计算图,同时支持高阶微分。 [#60174](https://github.com/PaddlePaddle/Paddle/pull/60174), [#60328](https://github.com/PaddlePaddle/Paddle/pull/60328), [#60818](https://github.com/PaddlePaddle/Paddle/pull/60818), [#61352](https://github.com/PaddlePaddle/Paddle/pull/61352), [#61661](https://github.com/PaddlePaddle/Paddle/pull/61661), [#61927](https://github.com/PaddlePaddle/Paddle/pull/61927), [#62772](https://github.com/PaddlePaddle/Paddle/pull/62772), [#60360](https://github.com/PaddlePaddle/Paddle/pull/60360), [#60866](https://github.com/PaddlePaddle/Paddle/pull/60866), [#60970](https://github.com/PaddlePaddle/Paddle/pull/60970), [#60810](https://github.com/PaddlePaddle/Paddle/pull/60810), [#64696](https://github.com/PaddlePaddle/Paddle/pull/64696), [#59844](https://github.com/PaddlePaddle/Paddle/pull/59844), [#59999](https://github.com/PaddlePaddle/Paddle/pull/59999), [#60262](https://github.com/PaddlePaddle/Paddle/pull/60262), [#60338](https://github.com/PaddlePaddle/Paddle/pull/60338), [#59935](https://github.com/PaddlePaddle/Paddle/pull/59935), [#59982](https://github.com/PaddlePaddle/Paddle/pull/59982), [#60221](https://github.com/PaddlePaddle/Paddle/pull/60221), [#62621](https://github.com/PaddlePaddle/Paddle/pull/62621), [#60044](https://github.com/PaddlePaddle/Paddle/pull/60044), [#59790](https://github.com/PaddlePaddle/Paddle/pull/59790), [#60529](https://github.com/PaddlePaddle/Paddle/pull/60529), [#61378](https://github.com/PaddlePaddle/Paddle/pull/61378), [#61584](https://github.com/PaddlePaddle/Paddle/pull/61584) -- 基于 PIR 实现控制流逻辑,提升 PIR 的表达能力,更好的支持训练和推理等多场景业务。[#61396](https://github.com/PaddlePaddle/Paddle/pull/61396), [#64045](https://github.com/PaddlePaddle/Paddle/pull/64045), [#60953](https://github.com/PaddlePaddle/Paddle/pull/60953), [#61091](https://github.com/PaddlePaddle/Paddle/pull/61091), [#61304](https://github.com/PaddlePaddle/Paddle/pull/61304), [#62093](https://github.com/PaddlePaddle/Paddle/pull/62093), [#64710](https://github.com/PaddlePaddle/Paddle/pull/64710), [#60668](https://github.com/PaddlePaddle/Paddle/pull/60668), [#60433](https://github.com/PaddlePaddle/Paddle/pull/60433), [#60963](https://github.com/PaddlePaddle/Paddle/pull/60963), [#61192](https://github.com/PaddlePaddle/Paddle/pull/61192), [#60895](https://github.com/PaddlePaddle/Paddle/pull/60895), [#60017](https://github.com/PaddlePaddle/Paddle/pull/60017), [#60369](https://github.com/PaddlePaddle/Paddle/pull/60369), [#60330](https://github.com/PaddlePaddle/Paddle/pull/60330), [#60364](https://github.com/PaddlePaddle/Paddle/pull/60364), [#61416](https://github.com/PaddlePaddle/Paddle/pull/61416), [#60460](https://github.com/PaddlePaddle/Paddle/pull/60460), [#60703](https://github.com/PaddlePaddle/Paddle/pull/60703), [#61027](https://github.com/PaddlePaddle/Paddle/pull/61027) -- 基于 PIR 实现 save/load 逻辑,打通 PIR 和上下游训练和推理业务的流程。 [#63438](https://github.com/PaddlePaddle/Paddle/pull/63438), [#63574](https://github.com/PaddlePaddle/Paddle/pull/63574), [#64281](https://github.com/PaddlePaddle/Paddle/pull/64281), [#64327](https://github.com/PaddlePaddle/Paddle/pull/64327), [#63622](https://github.com/PaddlePaddle/Paddle/pull/63622), [#64507](https://github.com/PaddlePaddle/Paddle/pull/64507), [#63389](https://github.com/PaddlePaddle/Paddle/pull/63389), [#63539](https://github.com/PaddlePaddle/Paddle/pull/63539), [#63749](https://github.com/PaddlePaddle/Paddle/pull/63749), [#63957](https://github.com/PaddlePaddle/Paddle/pull/63957), [#64044](https://github.com/PaddlePaddle/Paddle/pull/64044), [#64121](https://github.com/PaddlePaddle/Paddle/pull/64121), [#64239](https://github.com/PaddlePaddle/Paddle/pull/64239), [#63818](https://github.com/PaddlePaddle/Paddle/pull/63818), [#63910](https://github.com/PaddlePaddle/Paddle/pull/63910),[#63380](https://github.com/PaddlePaddle/Paddle/pull/63380)[#63380](https://github.com/PaddlePaddle/Paddle/pull/63380),[#63275](https://github.com/PaddlePaddle/Paddle/pull/63275),[#63663](https://github.com/PaddlePaddle/Paddle/pull/63663),[#64692](https://github.com/PaddlePaddle/Paddle/pull/64692),[#63958](https://github.com/PaddlePaddle/Paddle/pull/63958) -- 完成 OneDNN 相关基础功能开发和验证,为 OneDNN 全面切换做准备。 [#60680](https://github.com/PaddlePaddle/Paddle/pull/60680), [#60665](https://github.com/PaddlePaddle/Paddle/pull/60665), [#63162](https://github.com/PaddlePaddle/Paddle/pull/63162), [#59917](https://github.com/PaddlePaddle/Paddle/pull/59917), [#62901](https://github.com/PaddlePaddle/Paddle/pull/62901), [#59918](https://github.com/PaddlePaddle/Paddle/pull/59918), [#60257](https://github.com/PaddlePaddle/Paddle/pull/60257), [#60502](https://github.com/PaddlePaddle/Paddle/pull/60502), [#61062](https://github.com/PaddlePaddle/Paddle/pull/61062), [#61170](https://github.com/PaddlePaddle/Paddle/pull/61170), [#61474](https://github.com/PaddlePaddle/Paddle/pull/61474), [#60874](https://github.com/PaddlePaddle/Paddle/pull/60874), [#61495](https://github.com/PaddlePaddle/Paddle/pull/61495), [#61664](https://github.com/PaddlePaddle/Paddle/pull/61664), [#61649](https://github.com/PaddlePaddle/Paddle/pull/61649), [#61592](https://github.com/PaddlePaddle/Paddle/pull/61592), [#61667](https://github.com/PaddlePaddle/Paddle/pull/61667), [#61137](https://github.com/PaddlePaddle/Paddle/pull/61137), [#60952](https://github.com/PaddlePaddle/Paddle/pull/60952), [#61651](https://github.com/PaddlePaddle/Paddle/pull/61651), [#62126](https://github.com/PaddlePaddle/Paddle/pull/62126), [#62187](https://github.com/PaddlePaddle/Paddle/pull/62187), [#61307](https://github.com/PaddlePaddle/Paddle/pull/61307), [#62734](https://github.com/PaddlePaddle/Paddle/pull/62734), [#60974](https://github.com/PaddlePaddle/Paddle/pull/60974), [#61451](https://github.com/PaddlePaddle/Paddle/pull/61451), [#61011](https://github.com/PaddlePaddle/Paddle/pull/61011), [#61218](https://github.com/PaddlePaddle/Paddle/pull/61218), [#61623](https://github.com/PaddlePaddle/Paddle/pull/61623), [#61893](https://github.com/PaddlePaddle/Paddle/pull/61893), [#61876](https://github.com/PaddlePaddle/Paddle/pull/61876), [#61892](https://github.com/PaddlePaddle/Paddle/pull/61892), [#62085](https://github.com/PaddlePaddle/Paddle/pull/62085), [#62220](https://github.com/PaddlePaddle/Paddle/pull/62220), [#62244](https://github.com/PaddlePaddle/Paddle/pull/62244), [#62265](https://github.com/PaddlePaddle/Paddle/pull/62265), [#60754](https://github.com/PaddlePaddle/Paddle/pull/60754), [#60896](https://github.com/PaddlePaddle/Paddle/pull/60896), [#61868](https://github.com/PaddlePaddle/Paddle/pull/61868), [#61659](https://github.com/PaddlePaddle/Paddle/pull/61659), [#62241](https://github.com/PaddlePaddle/Paddle/pull/62241), [#62471](https://github.com/PaddlePaddle/Paddle/pull/62471), [#61165](https://github.com/PaddlePaddle/Paddle/pull/61165),[#64441](https://github.com/PaddlePaddle/Paddle/pull/64441),[#63141](https://github.com/PaddlePaddle/Paddle/pull/63141),[#63145](https://github.com/PaddlePaddle/Paddle/pull/63145),[#63592](https://github.com/PaddlePaddle/Paddle/pull/63592),[#63617](https://github.com/PaddlePaddle/Paddle/pull/63617),[#63518](https://github.com/PaddlePaddle/Paddle/pull/63518),[#63726](https://github.com/PaddlePaddle/Paddle/pull/63726),[#63853](https://github.com/PaddlePaddle/Paddle/pull/63853),[#63812](https://github.com/PaddlePaddle/Paddle/pull/63812),[#63811](https://github.com/PaddlePaddle/Paddle/pull/63811),[#64524](https://github.com/PaddlePaddle/Paddle/pull/64524),[#62993](https://github.com/PaddlePaddle/Paddle/pull/62993),[#63516](https://github.com/PaddlePaddle/Paddle/pull/63516),[#62998](https://github.com/PaddlePaddle/Paddle/pull/62998),[#63151](https://github.com/PaddlePaddle/Paddle/pull/63151),[#64661](https://github.com/PaddlePaddle/Paddle/pull/64661),[#64433](https://github.com/PaddlePaddle/Paddle/pull/64433),[#64448](https://github.com/PaddlePaddle/Paddle/pull/64448),[#63201](https://github.com/PaddlePaddle/Paddle/pull/63201),[#63230](https://github.com/PaddlePaddle/Paddle/pull/63230),[#63233](https://github.com/PaddlePaddle/Paddle/pull/63233),[#63281](https://github.com/PaddlePaddle/Paddle/pull/63281),[#64671](https://github.com/PaddlePaddle/Paddle/pull/64671),[#63274](https://github.com/PaddlePaddle/Paddle/pull/63274) -- 基于 PIR 实现 Sparse 相关逻辑,包含基础的 Type 类型和算子表达,并完成 Sparse 重点功能验证。 [#62868](https://github.com/PaddlePaddle/Paddle/pull/62868), [#63015](https://github.com/PaddlePaddle/Paddle/pull/63015), [#62894](https://github.com/PaddlePaddle/Paddle/pull/62894) - -### 动转静功能优化 -优化动转静基础能力,适配 SOT 训练场景下的动态维度,支持 Python3.12。 -- 完成动转静场景的 PIR 适配。[#60988](https://github.com/PaddlePaddle/Paddle/pull/60988), [#61936](https://github.com/PaddlePaddle/Paddle/pull/61936), [#59929](https://github.com/PaddlePaddle/Paddle/pull/59929), [#61790](https://github.com/PaddlePaddle/Paddle/pull/61790), [#64323](https://github.com/PaddlePaddle/Paddle/pull/64323), [#62030](https://github.com/PaddlePaddle/Paddle/pull/62030), [#61143](https://github.com/PaddlePaddle/Paddle/pull/61143), [#62680](https://github.com/PaddlePaddle/Paddle/pull/62680), [#63309](https://github.com/PaddlePaddle/Paddle/pull/63309), [#63311](https://github.com/PaddlePaddle/Paddle/pull/63311), [#62199](https://github.com/PaddlePaddle/Paddle/pull/62199) -- SOT 适配 Python 3.12 版本字节码,动转静 SOT 功能能够在 Python 3.12 版本使用。[#61414](https://github.com/PaddlePaddle/Paddle/pull/61414), [#59562](https://github.com/PaddlePaddle/Paddle/pull/59562), [#61031](https://github.com/PaddlePaddle/Paddle/pull/61031), [#61272](https://github.com/PaddlePaddle/Paddle/pull/61272), [#61412](https://github.com/PaddlePaddle/Paddle/pull/61412), [#61305](https://github.com/PaddlePaddle/Paddle/pull/61305), [#61964](https://github.com/PaddlePaddle/Paddle/pull/61964), [#62008](https://github.com/PaddlePaddle/Paddle/pull/62008), [#62028](https://github.com/PaddlePaddle/Paddle/pull/62028), [#61995](https://github.com/PaddlePaddle/Paddle/pull/61995), [#62073](https://github.com/PaddlePaddle/Paddle/pull/62073), [#62120](https://github.com/PaddlePaddle/Paddle/pull/62120), [#62218](https://github.com/PaddlePaddle/Paddle/pull/62218), [#62155](https://github.com/PaddlePaddle/Paddle/pull/62155) -- SOT 完成训练场景动态维度的适配,避免维度发生改变,触发重复构图,提升运行效率。[#64278](https://github.com/PaddlePaddle/Paddle/pull/64278), [#64435](https://github.com/PaddlePaddle/Paddle/pull/64435), [#64499](https://github.com/PaddlePaddle/Paddle/pull/64499), [#64500](https://github.com/PaddlePaddle/Paddle/pull/64500), [#62080](https://github.com/PaddlePaddle/Paddle/pull/62080) - -### 算子机制 -针对飞桨框架部分算子 Kernel 实现不完备、计算逻辑不高效等问题,我们对飞桨的部分算子功能和算子体系内部机制做了进一步的完善优化,修复部分已知问题,并新增了一些特性支持。 -- 针对 XPU Kernel,优化了`numel`、`concat`、`slice`等算子的数据类型支持,以及`AdamW`优化器的混合精度训练支持等。[#63715](https://github.com/PaddlePaddle/Paddle/pull/63715), [#61617](https://github.com/PaddlePaddle/Paddle/pull/61617), [#61694](https://github.com/PaddlePaddle/Paddle/pull/61694), [#64542](https://github.com/PaddlePaddle/Paddle/pull/64542), [#63644](https://github.com/PaddlePaddle/Paddle/pull/63644), [#61340](https://github.com/PaddlePaddle/Paddle/pull/61340), [#63108](https://github.com/PaddlePaddle/Paddle/pull/63108) -- 对部分算子进行了功能和性能的改进。[#59413](https://github.com/PaddlePaddle/Paddle/pull/59413), [#60295](https://github.com/PaddlePaddle/Paddle/pull/60295), [#64304](https://github.com/PaddlePaddle/Paddle/pull/64304), [#60979](https://github.com/PaddlePaddle/Paddle/pull/60979), [#63556](https://github.com/PaddlePaddle/Paddle/pull/63556), [#63061](https://github.com/PaddlePaddle/Paddle/pull/63061), [#62533](https://github.com/PaddlePaddle/Paddle/pull/62533) -- 完善组合算子的内部实现机制,并且为部分算子新增和优化组合拆分逻辑。[#59448](https://github.com/PaddlePaddle/Paddle/pull/59448), [#60505](https://github.com/PaddlePaddle/Paddle/pull/60505), [#59891](https://github.com/PaddlePaddle/Paddle/pull/59891), [#63161](https://github.com/PaddlePaddle/Paddle/pull/63161), [#63245](https://github.com/PaddlePaddle/Paddle/pull/63245), [#63782](https://github.com/PaddlePaddle/Paddle/pull/63782), [#64346](https://github.com/PaddlePaddle/Paddle/pull/64346), [#63156](https://github.com/PaddlePaddle/Paddle/pull/63156), [#63171](https://github.com/PaddlePaddle/Paddle/pull/63171), [#61315](https://github.com/PaddlePaddle/Paddle/pull/61315), [#61701](https://github.com/PaddlePaddle/Paddle/pull/61701), [#61874](https://github.com/PaddlePaddle/Paddle/pull/61874), [#61873](https://github.com/PaddlePaddle/Paddle/pull/61873), [#62059](https://github.com/PaddlePaddle/Paddle/pull/62059), [#61912](https://github.com/PaddlePaddle/Paddle/pull/61912), [#62112](https://github.com/PaddlePaddle/Paddle/pull/62112), [#63011](https://github.com/PaddlePaddle/Paddle/pull/63011), [#63009](https://github.com/PaddlePaddle/Paddle/pull/63009), [#64714](https://github.com/PaddlePaddle/Paddle/pull/64714) +- 修复 `paddle.nn.functional.max_unpool1d` 中当输入 output_size 为 tuple 时的 bug。 [#65910](https://github.com/PaddlePaddle/Paddle/pull/65910) +- 修复 `paddle.base.core.eager.Tensor` 中不支持 paddle::DataType 的问题。 [#66765](https://github.com/PaddlePaddle/Paddle/pull/66765) +- 修复打开 pir 开关时,bf16 训练报错的问题。 [#66833](https://github.com/PaddlePaddle/Paddle/pull/66833) +- 修复流水线并行中,线性层 bias 的问题。 [#67212](https://github.com/PaddlePaddle/Paddle/pull/67212) +- 修复流水线并行中,使用 loss 进行判断时的报错问题。 [#66980](https://github.com/PaddlePaddle/Paddle/pull/66980) +- 修复流水线并行中,使用`paddle.Tensor.item` 的报错问题。 [#67441](https://github.com/PaddlePaddle/Paddle/pull/67441) +- 修复 `paddle.einsum` 在特定场景的 bug。 [#67588](https://github.com/PaddlePaddle/Paddle/pull/67588) +- 修复 `paddle.nn.SyncBatchNorm` 在梯度计算时的报错问题。 [#67559](https://github.com/PaddlePaddle/Paddle/pull/67559) +- 修复 [issue #69992](https://github.com/PaddlePaddle/Paddle/issues/69992) 提到的问题。 [#70017](https://github.com/PaddlePaddle/Paddle/pull/70017) +- 修复 `paddle.arange` 在遇到大整数时,计算结果错误的问题。 [#70188](https://github.com/PaddlePaddle/Paddle/pull/70188) +- 修复 `paddle.max`、`paddle.min` 在输入存在 nan 时传播不正确问题。 [#70049](https://github.com/PaddlePaddle/Paddle/pull/70049) +- 修复 `paddle.linalg.svd`, `paddle.linalg.any` 等 API 在处理 0-size Tensor 时的问题。 [#70235](https://github.com/PaddlePaddle/Paddle/pull/70235), [#70489](https://github.com/PaddlePaddle/Paddle/pull/70489), [#70047](https://github.com/PaddlePaddle/Paddle/pull/70047), [#70103](https://github.com/PaddlePaddle/Paddle/pull/70103), [#70127](https://github.com/PaddlePaddle/Paddle/pull/70127), [#70098](https://github.com/PaddlePaddle/Paddle/pull/70098), [#70077](https://github.com/PaddlePaddle/Paddle/pull/70077), [#70130](https://github.com/PaddlePaddle/Paddle/pull/70130), [#70254](https://github.com/PaddlePaddle/Paddle/pull/70254), [#70125](https://github.com/PaddlePaddle/Paddle/pull/70125), [#70342](https://github.com/PaddlePaddle/Paddle/pull/70342), [#70369](https://github.com/PaddlePaddle/Paddle/pull/70369), [#71094](https://github.com/PaddlePaddle/Paddle/pull/71094), [#71089](https://github.com/PaddlePaddle/Paddle/pull/71089), [#71185](https://github.com/PaddlePaddle/Paddle/pull/71185), [#70537](https://github.com/PaddlePaddle/Paddle/pull/70537), [#70481](https://github.com/PaddlePaddle/Paddle/pull/70481) +- 修复一些类型提示标注的问题、文档问题等。[#65429](https://github.com/PaddlePaddle/Paddle/pull/65429), [#65496](https://github.com/PaddlePaddle/Paddle/pull/65496), [#65461](https://github.com/PaddlePaddle/Paddle/pull/65461), [#65542](https://github.com/PaddlePaddle/Paddle/pull/65542), [#65575](https://github.com/PaddlePaddle/Paddle/pull/65575), [#65545](https://github.com/PaddlePaddle/Paddle/pull/65545), [#65609](https://github.com/PaddlePaddle/Paddle/pull/65609), [#65644](https://github.com/PaddlePaddle/Paddle/pull/65644), [#65700](https://github.com/PaddlePaddle/Paddle/pull/65700), [#65697](https://github.com/PaddlePaddle/Paddle/pull/65697), [#65719](https://github.com/PaddlePaddle/Paddle/pull/65719), [#65639](https://github.com/PaddlePaddle/Paddle/pull/65639), [#65742](https://github.com/PaddlePaddle/Paddle/pull/65742), [#65891](https://github.com/PaddlePaddle/Paddle/pull/65891), [#65877](https://github.com/PaddlePaddle/Paddle/pull/65877), [#65895](https://github.com/PaddlePaddle/Paddle/pull/65895), [#66007](https://github.com/PaddlePaddle/Paddle/pull/66007), [#66679](https://github.com/PaddlePaddle/Paddle/pull/66679), [#66680](https://github.com/PaddlePaddle/Paddle/pull/66680), [#66676](https://github.com/PaddlePaddle/Paddle/pull/66676), [#66677](https://github.com/PaddlePaddle/Paddle/pull/66677), [#66884](https://github.com/PaddlePaddle/Paddle/pull/66884), [#67288](https://github.com/PaddlePaddle/Paddle/pull/67288), [#67302](https://github.com/PaddlePaddle/Paddle/pull/67302), [#66978](https://github.com/PaddlePaddle/Paddle/pull/66978), [#67295](https://github.com/PaddlePaddle/Paddle/pull/67295), [#67520](https://github.com/PaddlePaddle/Paddle/pull/67520), [#67421](https://github.com/PaddlePaddle/Paddle/pull/67421), [#67529](https://github.com/PaddlePaddle/Paddle/pull/67529), [#67536](https://github.com/PaddlePaddle/Paddle/pull/67536), [#67618](https://github.com/PaddlePaddle/Paddle/pull/67618), [#67661](https://github.com/PaddlePaddle/Paddle/pull/67661), [#67698](https://github.com/PaddlePaddle/Paddle/pull/67698), [#67800](https://github.com/PaddlePaddle/Paddle/pull/67800), [#67933](https://github.com/PaddlePaddle/Paddle/pull/67933), [#67893](https://github.com/PaddlePaddle/Paddle/pull/67893), [#68108](https://github.com/PaddlePaddle/Paddle/pull/68108), [#67927](https://github.com/PaddlePaddle/Paddle/pull/67927), [#68322](https://github.com/PaddlePaddle/Paddle/pull/68322), [#68341](https://github.com/PaddlePaddle/Paddle/pull/68341), [#68415](https://github.com/PaddlePaddle/Paddle/pull/68415), [#68372](https://github.com/PaddlePaddle/Paddle/pull/68372), [#68559](https://github.com/PaddlePaddle/Paddle/pull/68559), [#68598](https://github.com/PaddlePaddle/Paddle/pull/68598), [#68708](https://github.com/PaddlePaddle/Paddle/pull/68708), [#68780](https://github.com/PaddlePaddle/Paddle/pull/68780), [#68992](https://github.com/PaddlePaddle/Paddle/pull/68992), [#68989](https://github.com/PaddlePaddle/Paddle/pull/68989), [#68895](https://github.com/PaddlePaddle/Paddle/pull/68895), [#69014](https://github.com/PaddlePaddle/Paddle/pull/69014), [#69139](https://github.com/PaddlePaddle/Paddle/pull/69139), [#68996](https://github.com/PaddlePaddle/Paddle/pull/68996), [#69090](https://github.com/PaddlePaddle/Paddle/pull/69090), [#68922](https://github.com/PaddlePaddle/Paddle/pull/68922), [#69333](https://github.com/PaddlePaddle/Paddle/pull/69333), [#69141](https://github.com/PaddlePaddle/Paddle/pull/69141), [#69609](https://github.com/PaddlePaddle/Paddle/pull/69609), [#69652](https://github.com/PaddlePaddle/Paddle/pull/69652), [#69715](https://github.com/PaddlePaddle/Paddle/pull/69715), [#69716](https://github.com/PaddlePaddle/Paddle/pull/69716), [#69934](https://github.com/PaddlePaddle/Paddle/pull/69934), [#70253](https://github.com/PaddlePaddle/Paddle/pull/70253), [#70297](https://github.com/PaddlePaddle/Paddle/pull/70297), [#70252](https://github.com/PaddlePaddle/Paddle/pull/70252), [#70468](https://github.com/PaddlePaddle/Paddle/pull/70468), [#70102](https://github.com/PaddlePaddle/Paddle/pull/70102), [#70546](https://github.com/PaddlePaddle/Paddle/pull/70546), [#70616](https://github.com/PaddlePaddle/Paddle/pull/70616), [#70582](https://github.com/PaddlePaddle/Paddle/pull/70582), [#70635](https://github.com/PaddlePaddle/Paddle/pull/70635), [#70499](https://github.com/PaddlePaddle/Paddle/pull/70499), [#70755](https://github.com/PaddlePaddle/Paddle/pull/70755), [#70935](https://github.com/PaddlePaddle/Paddle/pull/70935), [#71133](https://github.com/PaddlePaddle/Paddle/pull/71133), [#71172](https://github.com/PaddlePaddle/Paddle/pull/71172), [#71238](https://github.com/PaddlePaddle/Paddle/pull/71238), [#71230](https://github.com/PaddlePaddle/Paddle/pull/71230), [#71394](https://github.com/PaddlePaddle/Paddle/pull/71394) + +### 文档优化 + +- 增强了若干 API 文档,使得文档易读和易懂。[#67772](https://github.com/PaddlePaddle/Paddle/pull/67772), [#69895](https://github.com/PaddlePaddle/Paddle/pull/69895), [#65904](https://github.com/PaddlePaddle/Paddle/pull/65904), [#66480](https://github.com/PaddlePaddle/Paddle/pull/66480), [#66974](https://github.com/PaddlePaddle/Paddle/pull/66974), [#67100](https://github.com/PaddlePaddle/Paddle/pull/67100), [#66991](https://github.com/PaddlePaddle/Paddle/pull/66991), [#67287](https://github.com/PaddlePaddle/Paddle/pull/67287), [#67841](https://github.com/PaddlePaddle/Paddle/pull/67841), [#68206](https://github.com/PaddlePaddle/Paddle/pull/68206), [#68305](https://github.com/PaddlePaddle/Paddle/pull/68305), [#68462](https://github.com/PaddlePaddle/Paddle/pull/68462), [#67061](https://github.com/PaddlePaddle/Paddle/pull/67061), [#66503](https://github.com/PaddlePaddle/Paddle/pull/66503), [#68856](https://github.com/PaddlePaddle/Paddle/pull/68856), [#68866](https://github.com/PaddlePaddle/Paddle/pull/68866), [#68768](https://github.com/PaddlePaddle/Paddle/pull/68768), [#69215](https://github.com/PaddlePaddle/Paddle/pull/69215), [#69449](https://github.com/PaddlePaddle/Paddle/pull/69449), [#69396](https://github.com/PaddlePaddle/Paddle/pull/69396), [#69498](https://github.com/PaddlePaddle/Paddle/pull/69498), [#69413](https://github.com/PaddlePaddle/Paddle/pull/69413), [#69404](https://github.com/PaddlePaddle/Paddle/pull/69404), [#69729](https://github.com/PaddlePaddle/Paddle/pull/69729), [#69749](https://github.com/PaddlePaddle/Paddle/pull/69749), [#69266](https://github.com/PaddlePaddle/Paddle/pull/69266), [#69989](https://github.com/PaddlePaddle/Paddle/pull/69989), [#70209](https://github.com/PaddlePaddle/Paddle/pull/70209), [#70128](https://github.com/PaddlePaddle/Paddle/pull/70128), [#70143](https://github.com/PaddlePaddle/Paddle/pull/70143), [#69874](https://github.com/PaddlePaddle/Paddle/pull/69874), [#70242](https://github.com/PaddlePaddle/Paddle/pull/70242), [#70145](https://github.com/PaddlePaddle/Paddle/pull/70145), [#70813](https://github.com/PaddlePaddle/Paddle/pull/70813), [#71046](https://github.com/PaddlePaddle/Paddle/pull/71046) + +## 2. 基础执行架构 + +PIR 全面推全并默认开启,支持一键动转静,保证了框架卓越的性能表现和良好的拓展性。 ### Bug 修复 -- 修复 PIR、执行器、动转静等相关的 Bug。[#64442](https://github.com/PaddlePaddle/Paddle/pull/64442), [#60443](https://github.com/PaddlePaddle/Paddle/pull/60443), [#60122](https://github.com/PaddlePaddle/Paddle/pull/60122), [#60625](https://github.com/PaddlePaddle/Paddle/pull/60625), [#60607](https://github.com/PaddlePaddle/Paddle/pull/60607), [#60705](https://github.com/PaddlePaddle/Paddle/pull/60705), [#61110](https://github.com/PaddlePaddle/Paddle/pull/61110), [#61278](https://github.com/PaddlePaddle/Paddle/pull/61278), [#61448](https://github.com/PaddlePaddle/Paddle/pull/61448), [#61491](https://github.com/PaddlePaddle/Paddle/pull/61491), [#61692](https://github.com/PaddlePaddle/Paddle/pull/61692), [#62100](https://github.com/PaddlePaddle/Paddle/pull/62100), [#62239](https://github.com/PaddlePaddle/Paddle/pull/62239), [#62365](https://github.com/PaddlePaddle/Paddle/pull/62365), [#62758](https://github.com/PaddlePaddle/Paddle/pull/62758), [#63395](https://github.com/PaddlePaddle/Paddle/pull/63395), [#64272](https://github.com/PaddlePaddle/Paddle/pull/64272), [#62165](https://github.com/PaddlePaddle/Paddle/pull/62165), [#64151](https://github.com/PaddlePaddle/Paddle/pull/64151), [#64204](https://github.com/PaddlePaddle/Paddle/pull/64204), [#64815](https://github.com/PaddlePaddle/Paddle/pull/64815), [#63757](https://github.com/PaddlePaddle/Paddle/pull/63757), [#61972](https://github.com/PaddlePaddle/Paddle/pull/61972), [#64806](https://github.com/PaddlePaddle/Paddle/pull/64806), [#60010](https://github.com/PaddlePaddle/Paddle/pull/60010), [#60461](https://github.com/PaddlePaddle/Paddle/pull/60461), [#60310](https://github.com/PaddlePaddle/Paddle/pull/60310), [#62006](https://github.com/PaddlePaddle/Paddle/pull/62006), [#61591](https://github.com/PaddlePaddle/Paddle/pull/61591), [#60327](https://github.com/PaddlePaddle/Paddle/pull/60327), [#60720](https://github.com/PaddlePaddle/Paddle/pull/60720), [#64656](https://github.com/PaddlePaddle/Paddle/pull/64656), [#60236](https://github.com/PaddlePaddle/Paddle/pull/60236), [#60684](https://github.com/PaddlePaddle/Paddle/pull/60684), [#60790](https://github.com/PaddlePaddle/Paddle/pull/60790), [#60944](https://github.com/PaddlePaddle/Paddle/pull/60944), [#62056](https://github.com/PaddlePaddle/Paddle/pull/62056), [#62891](https://github.com/PaddlePaddle/Paddle/pull/62891), [#64676](https://github.com/PaddlePaddle/Paddle/pull/64676), [#60271](https://github.com/PaddlePaddle/Paddle/pull/60271), [#60634](https://github.com/PaddlePaddle/Paddle/pull/60634), [#60663](https://github.com/PaddlePaddle/Paddle/pull/60663), [#60827](https://github.com/PaddlePaddle/Paddle/pull/60827), [#60845](https://github.com/PaddlePaddle/Paddle/pull/60845), [#60905](https://github.com/PaddlePaddle/Paddle/pull/60905), [#60945](https://github.com/PaddlePaddle/Paddle/pull/60945), [#60949](https://github.com/PaddlePaddle/Paddle/pull/60949), [#61107](https://github.com/PaddlePaddle/Paddle/pull/61107), [#61111](https://github.com/PaddlePaddle/Paddle/pull/61111), [#61117](https://github.com/PaddlePaddle/Paddle/pull/61117), [#61158](https://github.com/PaddlePaddle/Paddle/pull/61158), [#61177](https://github.com/PaddlePaddle/Paddle/pull/61177), [#61355](https://github.com/PaddlePaddle/Paddle/pull/61355), [#61593](https://github.com/PaddlePaddle/Paddle/pull/61593), [#61666](https://github.com/PaddlePaddle/Paddle/pull/61666), [#61934](https://github.com/PaddlePaddle/Paddle/pull/61934), [#62216](https://github.com/PaddlePaddle/Paddle/pull/62216), [#62491](https://github.com/PaddlePaddle/Paddle/pull/62491), [#62515](https://github.com/PaddlePaddle/Paddle/pull/62515), [#62594](https://github.com/PaddlePaddle/Paddle/pull/62594), [#62605](https://github.com/PaddlePaddle/Paddle/pull/62605), [#62895](https://github.com/PaddlePaddle/Paddle/pull/62895), [#62913](https://github.com/PaddlePaddle/Paddle/pull/62913), [#64413](https://github.com/PaddlePaddle/Paddle/pull/64413), [#59947](https://github.com/PaddlePaddle/Paddle/pull/59947), [#60264](https://github.com/PaddlePaddle/Paddle/pull/60264), [#60721](https://github.com/PaddlePaddle/Paddle/pull/60721), [#63113](https://github.com/PaddlePaddle/Paddle/pull/63113), [#63629](https://github.com/PaddlePaddle/Paddle/pull/63629), [#64300](https://github.com/PaddlePaddle/Paddle/pull/64300), [#64450](https://github.com/PaddlePaddle/Paddle/pull/64450), [#64532](https://github.com/PaddlePaddle/Paddle/pull/64532), [#64561](https://github.com/PaddlePaddle/Paddle/pull/64561), [#64625](https://github.com/PaddlePaddle/Paddle/pull/64625), [#64731](https://github.com/PaddlePaddle/Paddle/pull/64731), [#60059](https://github.com/PaddlePaddle/Paddle/pull/60059), [#60487](https://github.com/PaddlePaddle/Paddle/pull/60487), [#60423](https://github.com/PaddlePaddle/Paddle/pull/60423), [#61599](https://github.com/PaddlePaddle/Paddle/pull/61599), [#62032](https://github.com/PaddlePaddle/Paddle/pull/62032), [#62686](https://github.com/PaddlePaddle/Paddle/pull/62686), [#64055](https://github.com/PaddlePaddle/Paddle/pull/64055), [#60751](https://github.com/PaddlePaddle/Paddle/pull/60751), [#61646](https://github.com/PaddlePaddle/Paddle/pull/61646), [#60454](https://github.com/PaddlePaddle/Paddle/pull/60454), [#62530](https://github.com/PaddlePaddle/Paddle/pull/62530), [#62821](https://github.com/PaddlePaddle/Paddle/pull/62821), [#64454](https://github.com/PaddlePaddle/Paddle/pull/64454), [#64754](https://github.com/PaddlePaddle/Paddle/pull/64754), [#59860](https://github.com/PaddlePaddle/Paddle/pull/59860), [#60280](https://github.com/PaddlePaddle/Paddle/pull/60280), [#60357](https://github.com/PaddlePaddle/Paddle/pull/60357), [#60363](https://github.com/PaddlePaddle/Paddle/pull/60363), [#60900](https://github.com/PaddlePaddle/Paddle/pull/60900), [#61185](https://github.com/PaddlePaddle/Paddle/pull/61185), [#61505](https://github.com/PaddlePaddle/Paddle/pull/61505), [#61644](https://github.com/PaddlePaddle/Paddle/pull/61644), [#62256](https://github.com/PaddlePaddle/Paddle/pull/62256), [#62396](https://github.com/PaddlePaddle/Paddle/pull/62396), [#63040](https://github.com/PaddlePaddle/Paddle/pull/63040), [#63409](https://github.com/PaddlePaddle/Paddle/pull/63409), [#63764](https://github.com/PaddlePaddle/Paddle/pull/63764), [#59571](https://github.com/PaddlePaddle/Paddle/pull/59571), [#59894](https://github.com/PaddlePaddle/Paddle/pull/59894), [#59569](https://github.com/PaddlePaddle/Paddle/pull/59569), [#59896](https://github.com/PaddlePaddle/Paddle/pull/59896), [#60015](https://github.com/PaddlePaddle/Paddle/pull/60015), [#60081](https://github.com/PaddlePaddle/Paddle/pull/60081), [#60164](https://github.com/PaddlePaddle/Paddle/pull/60164), [#60200](https://github.com/PaddlePaddle/Paddle/pull/60200), [#60211](https://github.com/PaddlePaddle/Paddle/pull/60211), [#60267](https://github.com/PaddlePaddle/Paddle/pull/60267), [#60458](https://github.com/PaddlePaddle/Paddle/pull/60458), [#60395](https://github.com/PaddlePaddle/Paddle/pull/60395), [#60907](https://github.com/PaddlePaddle/Paddle/pull/60907), [#60707](https://github.com/PaddlePaddle/Paddle/pull/60707), [#60993](https://github.com/PaddlePaddle/Paddle/pull/60993), [#61401](https://github.com/PaddlePaddle/Paddle/pull/61401), [#61433](https://github.com/PaddlePaddle/Paddle/pull/61433), [#61450](https://github.com/PaddlePaddle/Paddle/pull/61450), [#61577](https://github.com/PaddlePaddle/Paddle/pull/61577), [#61575](https://github.com/PaddlePaddle/Paddle/pull/61575), [#61703](https://github.com/PaddlePaddle/Paddle/pull/61703), [#61711](https://github.com/PaddlePaddle/Paddle/pull/61711), [#61883](https://github.com/PaddlePaddle/Paddle/pull/61883), [#61822](https://github.com/PaddlePaddle/Paddle/pull/61822), [#62012](https://github.com/PaddlePaddle/Paddle/pull/62012), [#61858](https://github.com/PaddlePaddle/Paddle/pull/61858), [#62176](https://github.com/PaddlePaddle/Paddle/pull/62176), [#62257](https://github.com/PaddlePaddle/Paddle/pull/62257), [#62470](https://github.com/PaddlePaddle/Paddle/pull/62470), [#62536](https://github.com/PaddlePaddle/Paddle/pull/62536), [#62606](https://github.com/PaddlePaddle/Paddle/pull/62606), [#62808](https://github.com/PaddlePaddle/Paddle/pull/62808), [#62854](https://github.com/PaddlePaddle/Paddle/pull/62854), [#62879](https://github.com/PaddlePaddle/Paddle/pull/62879), [#62864](https://github.com/PaddlePaddle/Paddle/pull/62864), [#63063](https://github.com/PaddlePaddle/Paddle/pull/63063), [#62958](https://github.com/PaddlePaddle/Paddle/pull/62958), [#63397](https://github.com/PaddlePaddle/Paddle/pull/63397), [#63805](https://github.com/PaddlePaddle/Paddle/pull/63805), [#63694](https://github.com/PaddlePaddle/Paddle/pull/63694), [#64168](https://github.com/PaddlePaddle/Paddle/pull/64168), [#64184](https://github.com/PaddlePaddle/Paddle/pull/64184), [#64174](https://github.com/PaddlePaddle/Paddle/pull/64174), [#64315](https://github.com/PaddlePaddle/Paddle/pull/64315), [#64362](https://github.com/PaddlePaddle/Paddle/pull/64362), [#64400](https://github.com/PaddlePaddle/Paddle/pull/64400), [#64475](https://github.com/PaddlePaddle/Paddle/pull/64475), [#64458](https://github.com/PaddlePaddle/Paddle/pull/64458), [#64548](https://github.com/PaddlePaddle/Paddle/pull/64548), [#59858](https://github.com/PaddlePaddle/Paddle/pull/59858), [#61132](https://github.com/PaddlePaddle/Paddle/pull/61132), [#62010](https://github.com/PaddlePaddle/Paddle/pull/62010), [#62069](https://github.com/PaddlePaddle/Paddle/pull/62069), [#62707](https://github.com/PaddlePaddle/Paddle/pull/62707), [#62921](https://github.com/PaddlePaddle/Paddle/pull/62921), [#63085](https://github.com/PaddlePaddle/Paddle/pull/63085), [#63321](https://github.com/PaddlePaddle/Paddle/pull/63321), [#63351](https://github.com/PaddlePaddle/Paddle/pull/63351), [#63549](https://github.com/PaddlePaddle/Paddle/pull/63549), [#64567](https://github.com/PaddlePaddle/Paddle/pull/64567), [#59936](https://github.com/PaddlePaddle/Paddle/pull/59936), [#60269](https://github.com/PaddlePaddle/Paddle/pull/60269), [#60879](https://github.com/PaddlePaddle/Paddle/pull/60879), [#61314](https://github.com/PaddlePaddle/Paddle/pull/61314), [#61391](https://github.com/PaddlePaddle/Paddle/pull/61391), [#61479](https://github.com/PaddlePaddle/Paddle/pull/61479), [#61789](https://github.com/PaddlePaddle/Paddle/pull/61789), [#61832](https://github.com/PaddlePaddle/Paddle/pull/61832), [#61864](https://github.com/PaddlePaddle/Paddle/pull/61864), [#61917](https://github.com/PaddlePaddle/Paddle/pull/61917), [#62052](https://github.com/PaddlePaddle/Paddle/pull/62052), [#62068](https://github.com/PaddlePaddle/Paddle/pull/62068), [#62293](https://github.com/PaddlePaddle/Paddle/pull/62293), [#62479](https://github.com/PaddlePaddle/Paddle/pull/62479), [#62506](https://github.com/PaddlePaddle/Paddle/pull/62506), [#59948](https://github.com/PaddlePaddle/Paddle/pull/59948), [#64118](https://github.com/PaddlePaddle/Paddle/pull/64118), [#64126](https://github.com/PaddlePaddle/Paddle/pull/64126), [#64195](https://github.com/PaddlePaddle/Paddle/pull/64195), [#64307](https://github.com/PaddlePaddle/Paddle/pull/64307), [#64314](https://github.com/PaddlePaddle/Paddle/pull/64314), [#64276](https://github.com/PaddlePaddle/Paddle/pull/64276), [#64312](https://github.com/PaddlePaddle/Paddle/pull/64312), [#64350](https://github.com/PaddlePaddle/Paddle/pull/64350), [#64319](https://github.com/PaddlePaddle/Paddle/pull/64319), [#64463](https://github.com/PaddlePaddle/Paddle/pull/64463), [#64457](https://github.com/PaddlePaddle/Paddle/pull/64457), [#64455](https://github.com/PaddlePaddle/Paddle/pull/64455), [#64487](https://github.com/PaddlePaddle/Paddle/pull/64487), [#64645](https://github.com/PaddlePaddle/Paddle/pull/64645), [#63155](https://github.com/PaddlePaddle/Paddle/pull/63155), [#59893](https://github.com/PaddlePaddle/Paddle/pull/59893), [#63332](https://github.com/PaddlePaddle/Paddle/pull/63332), [#63332](https://github.com/PaddlePaddle/Paddle/pull/63332), [#64786](https://github.com/PaddlePaddle/Paddle/pull/64786), [#60515](https://github.com/PaddlePaddle/Paddle/pull/60515), [#60627](https://github.com/PaddlePaddle/Paddle/pull/60627), [#60863](https://github.com/PaddlePaddle/Paddle/pull/60863), [#60854](https://github.com/PaddlePaddle/Paddle/pull/60854), [#61447](https://github.com/PaddlePaddle/Paddle/pull/61447), [#61440](https://github.com/PaddlePaddle/Paddle/pull/61440), [#61932](https://github.com/PaddlePaddle/Paddle/pull/61932), [#62131](https://github.com/PaddlePaddle/Paddle/pull/62131), [#62252](https://github.com/PaddlePaddle/Paddle/pull/62252), [#62283](https://github.com/PaddlePaddle/Paddle/pull/62283), [#62358](https://github.com/PaddlePaddle/Paddle/pull/62358), [#62411](https://github.com/PaddlePaddle/Paddle/pull/62411), [#62424](https://github.com/PaddlePaddle/Paddle/pull/62424), [#62810](https://github.com/PaddlePaddle/Paddle/pull/62810), [#62811](https://github.com/PaddlePaddle/Paddle/pull/62811), [#62896](https://github.com/PaddlePaddle/Paddle/pull/62896), [#62947](https://github.com/PaddlePaddle/Paddle/pull/62947), [#63182](https://github.com/PaddlePaddle/Paddle/pull/63182), [#63190](https://github.com/PaddlePaddle/Paddle/pull/63190), [#63294](https://github.com/PaddlePaddle/Paddle/pull/63294), [#63306](https://github.com/PaddlePaddle/Paddle/pull/63306), [#63352](https://github.com/PaddlePaddle/Paddle/pull/63352), [#63404](https://github.com/PaddlePaddle/Paddle/pull/63404), [#63474](https://github.com/PaddlePaddle/Paddle/pull/63474), [#64013](https://github.com/PaddlePaddle/Paddle/pull/64013), [#64674](https://github.com/PaddlePaddle/Paddle/pull/64674),[#60055](https://github.com/PaddlePaddle/Paddle/pull/60055),[#62050](https://github.com/PaddlePaddle/Paddle/pull/62050),[#62770](https://github.com/PaddlePaddle/Paddle/pull/62770),[#63234](https://github.com/PaddlePaddle/Paddle/pull/63234),[#63374](https://github.com/PaddlePaddle/Paddle/pull/63374),[#64277](https://github.com/PaddlePaddle/Paddle/pull/64277), [#63420](https://github.com/PaddlePaddle/Paddle/pull/63420), [#60312](https://github.com/PaddlePaddle/Paddle/pull/60312), [#63810](https://github.com/PaddlePaddle/Paddle/pull/63810), [#64631](https://github.com/PaddlePaddle/Paddle/pull/64631), [#63970](https://github.com/PaddlePaddle/Paddle/pull/63970), [#63708](https://github.com/PaddlePaddle/Paddle/pull/63708), [#62062](https://github.com/PaddlePaddle/Paddle/pull/62062), [#60898](https://github.com/PaddlePaddle/Paddle/pull/60898), [#62373](https://github.com/PaddlePaddle/Paddle/pull/62373), [#59878](https://github.com/PaddlePaddle/Paddle/pull/59878) -- 修复部分算子机制、算子实现逻辑和相关单测的 Bug。[#63792](https://github.com/PaddlePaddle/Paddle/pull/63792), [#60570](https://github.com/PaddlePaddle/Paddle/pull/60570), [#61572](https://github.com/PaddlePaddle/Paddle/pull/61572), [#59971](https://github.com/PaddlePaddle/Paddle/pull/59971), [#61336](https://github.com/PaddlePaddle/Paddle/pull/61336), [#63276](https://github.com/PaddlePaddle/Paddle/pull/63276), [#63251](https://github.com/PaddlePaddle/Paddle/pull/63251), [#63697](https://github.com/PaddlePaddle/Paddle/pull/63697), [#63706](https://github.com/PaddlePaddle/Paddle/pull/63706), [#64685](https://github.com/PaddlePaddle/Paddle/pull/64685), [#64009](https://github.com/PaddlePaddle/Paddle/pull/64009), [#62461](https://github.com/PaddlePaddle/Paddle/pull/62461), [#61568](https://github.com/PaddlePaddle/Paddle/pull/61568), [#63912](https://github.com/PaddlePaddle/Paddle/pull/63912), [#60475](https://github.com/PaddlePaddle/Paddle/pull/60475), [#60222](https://github.com/PaddlePaddle/Paddle/pull/60222), [#63961](https://github.com/PaddlePaddle/Paddle/pull/63961), [#63593](https://github.com/PaddlePaddle/Paddle/pull/63593) -### 开发者相关内容 -- 开发者相关内容,包含 PIR 切换、单测开启、功能验证等 PR。 [#60621](https://github.com/PaddlePaddle/Paddle/pull/60621), [#59703](https://github.com/PaddlePaddle/Paddle/pull/59703), [#59694](https://github.com/PaddlePaddle/Paddle/pull/59694), [#59717](https://github.com/PaddlePaddle/Paddle/pull/59717), [#59729](https://github.com/PaddlePaddle/Paddle/pull/59729), [#59730](https://github.com/PaddlePaddle/Paddle/pull/59730), [#60216](https://github.com/PaddlePaddle/Paddle/pull/60216), [#60238](https://github.com/PaddlePaddle/Paddle/pull/60238), [#60246](https://github.com/PaddlePaddle/Paddle/pull/60246), [#60343](https://github.com/PaddlePaddle/Paddle/pull/60343), [#60302](https://github.com/PaddlePaddle/Paddle/pull/60302), [#60870](https://github.com/PaddlePaddle/Paddle/pull/60870), [#59956](https://github.com/PaddlePaddle/Paddle/pull/59956), [#60795](https://github.com/PaddlePaddle/Paddle/pull/60795), [#62528](https://github.com/PaddlePaddle/Paddle/pull/62528), [#59932](https://github.com/PaddlePaddle/Paddle/pull/59932), [#59636](https://github.com/PaddlePaddle/Paddle/pull/59636), [#59959](https://github.com/PaddlePaddle/Paddle/pull/59959), [#59734](https://github.com/PaddlePaddle/Paddle/pull/59734), [#60287](https://github.com/PaddlePaddle/Paddle/pull/60287), [#60347](https://github.com/PaddlePaddle/Paddle/pull/60347), [#60335](https://github.com/PaddlePaddle/Paddle/pull/60335), [#60332](https://github.com/PaddlePaddle/Paddle/pull/60332), [#59631](https://github.com/PaddlePaddle/Paddle/pull/59631), [#60255](https://github.com/PaddlePaddle/Paddle/pull/60255), [#60329](https://github.com/PaddlePaddle/Paddle/pull/60329), [#60401](https://github.com/PaddlePaddle/Paddle/pull/60401), [#60522](https://github.com/PaddlePaddle/Paddle/pull/60522), [#60792](https://github.com/PaddlePaddle/Paddle/pull/60792), [#59617](https://github.com/PaddlePaddle/Paddle/pull/59617), [#60277](https://github.com/PaddlePaddle/Paddle/pull/60277), [#60584](https://github.com/PaddlePaddle/Paddle/pull/60584), [#60911](https://github.com/PaddlePaddle/Paddle/pull/60911), [#61322](https://github.com/PaddlePaddle/Paddle/pull/61322), [#60838](https://github.com/PaddlePaddle/Paddle/pull/60838), [#60602](https://github.com/PaddlePaddle/Paddle/pull/60602), [#61458](https://github.com/PaddlePaddle/Paddle/pull/61458), [#61607](https://github.com/PaddlePaddle/Paddle/pull/61607), [#61960](https://github.com/PaddlePaddle/Paddle/pull/61960), [#60484](https://github.com/PaddlePaddle/Paddle/pull/60484), [#61662](https://github.com/PaddlePaddle/Paddle/pull/61662), [#62263](https://github.com/PaddlePaddle/Paddle/pull/62263), [#62270](https://github.com/PaddlePaddle/Paddle/pull/62270), [#62469](https://github.com/PaddlePaddle/Paddle/pull/62469), [#62416](https://github.com/PaddlePaddle/Paddle/pull/62416), [#62443](https://github.com/PaddlePaddle/Paddle/pull/62443), [#62412](https://github.com/PaddlePaddle/Paddle/pull/62412), [#62541](https://github.com/PaddlePaddle/Paddle/pull/62541), [#62634](https://github.com/PaddlePaddle/Paddle/pull/62634), [#62369](https://github.com/PaddlePaddle/Paddle/pull/62369), [#60805](https://github.com/PaddlePaddle/Paddle/pull/60805), [#62644](https://github.com/PaddlePaddle/Paddle/pull/62644), [#62494](https://github.com/PaddlePaddle/Paddle/pull/62494), [#62767](https://github.com/PaddlePaddle/Paddle/pull/62767), [#62735](https://github.com/PaddlePaddle/Paddle/pull/62735), [#62802](https://github.com/PaddlePaddle/Paddle/pull/62802), [#62801](https://github.com/PaddlePaddle/Paddle/pull/62801), [#62783](https://github.com/PaddlePaddle/Paddle/pull/62783), [#62579](https://github.com/PaddlePaddle/Paddle/pull/62579), [#62833](https://github.com/PaddlePaddle/Paddle/pull/62833), [#62668](https://github.com/PaddlePaddle/Paddle/pull/62668), [#62972](https://github.com/PaddlePaddle/Paddle/pull/62972), [#62505](https://github.com/PaddlePaddle/Paddle/pull/62505), [#63005](https://github.com/PaddlePaddle/Paddle/pull/63005), [#62900](https://github.com/PaddlePaddle/Paddle/pull/62900), [#60577](https://github.com/PaddlePaddle/Paddle/pull/60577), [#60877](https://github.com/PaddlePaddle/Paddle/pull/60877), [#61076](https://github.com/PaddlePaddle/Paddle/pull/61076), [#61038](https://github.com/PaddlePaddle/Paddle/pull/61038), [#61112](https://github.com/PaddlePaddle/Paddle/pull/61112), [#61120](https://github.com/PaddlePaddle/Paddle/pull/61120), [#61582](https://github.com/PaddlePaddle/Paddle/pull/61582), [#61119](https://github.com/PaddlePaddle/Paddle/pull/61119), [#61036](https://github.com/PaddlePaddle/Paddle/pull/61036), [#61289](https://github.com/PaddlePaddle/Paddle/pull/61289), [#60695](https://github.com/PaddlePaddle/Paddle/pull/60695), [#61039](https://github.com/PaddlePaddle/Paddle/pull/61039), [#61963](https://github.com/PaddlePaddle/Paddle/pull/61963), [#62118](https://github.com/PaddlePaddle/Paddle/pull/62118), [#62797](https://github.com/PaddlePaddle/Paddle/pull/62797), [#62807](https://github.com/PaddlePaddle/Paddle/pull/62807), [#62887](https://github.com/PaddlePaddle/Paddle/pull/62887), [#62830](https://github.com/PaddlePaddle/Paddle/pull/62830), [#62849](https://github.com/PaddlePaddle/Paddle/pull/62849), [#62750](https://github.com/PaddlePaddle/Paddle/pull/62750), [#62965](https://github.com/PaddlePaddle/Paddle/pull/62965), [#59742](https://github.com/PaddlePaddle/Paddle/pull/59742), [#59867](https://github.com/PaddlePaddle/Paddle/pull/59867), [#60836](https://github.com/PaddlePaddle/Paddle/pull/60836), [#60902](https://github.com/PaddlePaddle/Paddle/pull/60902), [#61228](https://github.com/PaddlePaddle/Paddle/pull/61228), [#60037](https://github.com/PaddlePaddle/Paddle/pull/60037), [#60079](https://github.com/PaddlePaddle/Paddle/pull/60079), [#60173](https://github.com/PaddlePaddle/Paddle/pull/60173), [#60373](https://github.com/PaddlePaddle/Paddle/pull/60373), [#60380](https://github.com/PaddlePaddle/Paddle/pull/60380), [#60381](https://github.com/PaddlePaddle/Paddle/pull/60381), [#60750](https://github.com/PaddlePaddle/Paddle/pull/60750), [#61065](https://github.com/PaddlePaddle/Paddle/pull/61065), [#61122](https://github.com/PaddlePaddle/Paddle/pull/61122), [#61074](https://github.com/PaddlePaddle/Paddle/pull/61074), [#61204](https://github.com/PaddlePaddle/Paddle/pull/61204), [#61191](https://github.com/PaddlePaddle/Paddle/pull/61191), [#61182](https://github.com/PaddlePaddle/Paddle/pull/61182), [#61219](https://github.com/PaddlePaddle/Paddle/pull/61219), [#61296](https://github.com/PaddlePaddle/Paddle/pull/61296), [#61503](https://github.com/PaddlePaddle/Paddle/pull/61503), [#61484](https://github.com/PaddlePaddle/Paddle/pull/61484), [#61513](https://github.com/PaddlePaddle/Paddle/pull/61513), [#61476](https://github.com/PaddlePaddle/Paddle/pull/61476), [#61510](https://github.com/PaddlePaddle/Paddle/pull/61510), [#61511](https://github.com/PaddlePaddle/Paddle/pull/61511), [#61526](https://github.com/PaddlePaddle/Paddle/pull/61526), [#61524](https://github.com/PaddlePaddle/Paddle/pull/61524), [#61525](https://github.com/PaddlePaddle/Paddle/pull/61525), [#61466](https://github.com/PaddlePaddle/Paddle/pull/61466), [#61497](https://github.com/PaddlePaddle/Paddle/pull/61497), [#61538](https://github.com/PaddlePaddle/Paddle/pull/61538), [#61533](https://github.com/PaddlePaddle/Paddle/pull/61533), [#61530](https://github.com/PaddlePaddle/Paddle/pull/61530), [#61468](https://github.com/PaddlePaddle/Paddle/pull/61468), [#61527](https://github.com/PaddlePaddle/Paddle/pull/61527), [#61535](https://github.com/PaddlePaddle/Paddle/pull/61535), [#61512](https://github.com/PaddlePaddle/Paddle/pull/61512), [#61531](https://github.com/PaddlePaddle/Paddle/pull/61531), [#61539](https://github.com/PaddlePaddle/Paddle/pull/61539), [#61532](https://github.com/PaddlePaddle/Paddle/pull/61532), [#61521](https://github.com/PaddlePaddle/Paddle/pull/61521), [#61517](https://github.com/PaddlePaddle/Paddle/pull/61517), [#61518](https://github.com/PaddlePaddle/Paddle/pull/61518), [#61550](https://github.com/PaddlePaddle/Paddle/pull/61550), [#61545](https://github.com/PaddlePaddle/Paddle/pull/61545), [#61548](https://github.com/PaddlePaddle/Paddle/pull/61548), [#61519](https://github.com/PaddlePaddle/Paddle/pull/61519), [#61549](https://github.com/PaddlePaddle/Paddle/pull/61549), [#61574](https://github.com/PaddlePaddle/Paddle/pull/61574), [#61585](https://github.com/PaddlePaddle/Paddle/pull/61585), [#61581](https://github.com/PaddlePaddle/Paddle/pull/61581), [#61553](https://github.com/PaddlePaddle/Paddle/pull/61553), [#61504](https://github.com/PaddlePaddle/Paddle/pull/61504), [#61603](https://github.com/PaddlePaddle/Paddle/pull/61603), [#61534](https://github.com/PaddlePaddle/Paddle/pull/61534), [#61567](https://github.com/PaddlePaddle/Paddle/pull/61567), [#61523](https://github.com/PaddlePaddle/Paddle/pull/61523), [#61565](https://github.com/PaddlePaddle/Paddle/pull/61565), [#61564](https://github.com/PaddlePaddle/Paddle/pull/61564), [#61707](https://github.com/PaddlePaddle/Paddle/pull/61707), [#61560](https://github.com/PaddlePaddle/Paddle/pull/61560), [#61684](https://github.com/PaddlePaddle/Paddle/pull/61684), [#61706](https://github.com/PaddlePaddle/Paddle/pull/61706), [#61724](https://github.com/PaddlePaddle/Paddle/pull/61724), [#61719](https://github.com/PaddlePaddle/Paddle/pull/61719), [#61729](https://github.com/PaddlePaddle/Paddle/pull/61729), [#61763](https://github.com/PaddlePaddle/Paddle/pull/61763), [#61755](https://github.com/PaddlePaddle/Paddle/pull/61755), [#61737](https://github.com/PaddlePaddle/Paddle/pull/61737), [#61750](https://github.com/PaddlePaddle/Paddle/pull/61750), [#61753](https://github.com/PaddlePaddle/Paddle/pull/61753), [#61756](https://github.com/PaddlePaddle/Paddle/pull/61756), [#61777](https://github.com/PaddlePaddle/Paddle/pull/61777), [#61758](https://github.com/PaddlePaddle/Paddle/pull/61758), [#61731](https://github.com/PaddlePaddle/Paddle/pull/61731), [#61771](https://github.com/PaddlePaddle/Paddle/pull/61771), [#61739](https://github.com/PaddlePaddle/Paddle/pull/61739), [#61559](https://github.com/PaddlePaddle/Paddle/pull/61559), [#61717](https://github.com/PaddlePaddle/Paddle/pull/61717), [#61733](https://github.com/PaddlePaddle/Paddle/pull/61733), [#61563](https://github.com/PaddlePaddle/Paddle/pull/61563), [#61546](https://github.com/PaddlePaddle/Paddle/pull/61546), [#61566](https://github.com/PaddlePaddle/Paddle/pull/61566), [#61562](https://github.com/PaddlePaddle/Paddle/pull/61562), [#61793](https://github.com/PaddlePaddle/Paddle/pull/61793), [#61902](https://github.com/PaddlePaddle/Paddle/pull/61902), [#61905](https://github.com/PaddlePaddle/Paddle/pull/61905), [#61904](https://github.com/PaddlePaddle/Paddle/pull/61904), [#62227](https://github.com/PaddlePaddle/Paddle/pull/62227), [#62332](https://github.com/PaddlePaddle/Paddle/pull/62332), [#62653](https://github.com/PaddlePaddle/Paddle/pull/62653), [#62681](https://github.com/PaddlePaddle/Paddle/pull/62681), [#62709](https://github.com/PaddlePaddle/Paddle/pull/62709), [#62794](https://github.com/PaddlePaddle/Paddle/pull/62794), [#62938](https://github.com/PaddlePaddle/Paddle/pull/62938), [#63185](https://github.com/PaddlePaddle/Paddle/pull/63185), [#63754](https://github.com/PaddlePaddle/Paddle/pull/63754), [#63769](https://github.com/PaddlePaddle/Paddle/pull/63769), [#63793](https://github.com/PaddlePaddle/Paddle/pull/63793), [#63830](https://github.com/PaddlePaddle/Paddle/pull/63830), [#63939](https://github.com/PaddlePaddle/Paddle/pull/63939), [#64340](https://github.com/PaddlePaddle/Paddle/pull/64340), [#64657](https://github.com/PaddlePaddle/Paddle/pull/64657), [#62527](https://github.com/PaddlePaddle/Paddle/pull/62527), [#64088](https://github.com/PaddlePaddle/Paddle/pull/64088), [#60203](https://github.com/PaddlePaddle/Paddle/pull/60203), [#60372](https://github.com/PaddlePaddle/Paddle/pull/60372), [#60685](https://github.com/PaddlePaddle/Paddle/pull/60685), [#60815](https://github.com/PaddlePaddle/Paddle/pull/60815), [#60791](https://github.com/PaddlePaddle/Paddle/pull/60791), [#60864](https://github.com/PaddlePaddle/Paddle/pull/60864), [#60851](https://github.com/PaddlePaddle/Paddle/pull/60851), [#60844](https://github.com/PaddlePaddle/Paddle/pull/60844), [#60694](https://github.com/PaddlePaddle/Paddle/pull/60694), [#60855](https://github.com/PaddlePaddle/Paddle/pull/60855), [#60869](https://github.com/PaddlePaddle/Paddle/pull/60869), [#60948](https://github.com/PaddlePaddle/Paddle/pull/60948), [#61042](https://github.com/PaddlePaddle/Paddle/pull/61042), [#61455](https://github.com/PaddlePaddle/Paddle/pull/61455), [#61580](https://github.com/PaddlePaddle/Paddle/pull/61580), [#61589](https://github.com/PaddlePaddle/Paddle/pull/61589), [#61609](https://github.com/PaddlePaddle/Paddle/pull/61609), [#61616](https://github.com/PaddlePaddle/Paddle/pull/61616), [#61715](https://github.com/PaddlePaddle/Paddle/pull/61715), [#61716](https://github.com/PaddlePaddle/Paddle/pull/61716), [#61759](https://github.com/PaddlePaddle/Paddle/pull/61759), [#61555](https://github.com/PaddlePaddle/Paddle/pull/61555), [#61492](https://github.com/PaddlePaddle/Paddle/pull/61492), [#61805](https://github.com/PaddlePaddle/Paddle/pull/61805), [#61712](https://github.com/PaddlePaddle/Paddle/pull/61712), [#61615](https://github.com/PaddlePaddle/Paddle/pull/61615), [#61713](https://github.com/PaddlePaddle/Paddle/pull/61713), [#62129](https://github.com/PaddlePaddle/Paddle/pull/62129), [#59294](https://github.com/PaddlePaddle/Paddle/pull/59294), [#59865](https://github.com/PaddlePaddle/Paddle/pull/59865), [#60270](https://github.com/PaddlePaddle/Paddle/pull/60270), [#60547](https://github.com/PaddlePaddle/Paddle/pull/60547), [#60698](https://github.com/PaddlePaddle/Paddle/pull/60698), [#60762](https://github.com/PaddlePaddle/Paddle/pull/60762), [#60753](https://github.com/PaddlePaddle/Paddle/pull/60753), [#60966](https://github.com/PaddlePaddle/Paddle/pull/60966), [#60976](https://github.com/PaddlePaddle/Paddle/pull/60976), [#61100](https://github.com/PaddlePaddle/Paddle/pull/61100), [#61203](https://github.com/PaddlePaddle/Paddle/pull/61203), [#61210](https://github.com/PaddlePaddle/Paddle/pull/61210), [#61424](https://github.com/PaddlePaddle/Paddle/pull/61424), [#61213](https://github.com/PaddlePaddle/Paddle/pull/61213), [#61275](https://github.com/PaddlePaddle/Paddle/pull/61275), [#61276](https://github.com/PaddlePaddle/Paddle/pull/61276), [#61279](https://github.com/PaddlePaddle/Paddle/pull/61279), [#61292](https://github.com/PaddlePaddle/Paddle/pull/61292), [#61295](https://github.com/PaddlePaddle/Paddle/pull/61295), [#61298](https://github.com/PaddlePaddle/Paddle/pull/61298), [#61299](https://github.com/PaddlePaddle/Paddle/pull/61299), [#61301](https://github.com/PaddlePaddle/Paddle/pull/61301), [#61302](https://github.com/PaddlePaddle/Paddle/pull/61302), [#61329](https://github.com/PaddlePaddle/Paddle/pull/61329), [#61804](https://github.com/PaddlePaddle/Paddle/pull/61804), [#62745](https://github.com/PaddlePaddle/Paddle/pull/62745), [#62909](https://github.com/PaddlePaddle/Paddle/pull/62909), [#64247](https://github.com/PaddlePaddle/Paddle/pull/64247), [#64308](https://github.com/PaddlePaddle/Paddle/pull/64308), [#60690](https://github.com/PaddlePaddle/Paddle/pull/60690), [#61149](https://github.com/PaddlePaddle/Paddle/pull/61149), [#61145](https://github.com/PaddlePaddle/Paddle/pull/61145), [#61193](https://github.com/PaddlePaddle/Paddle/pull/61193), [#61207](https://github.com/PaddlePaddle/Paddle/pull/61207), [#61229](https://github.com/PaddlePaddle/Paddle/pull/61229), [#61236](https://github.com/PaddlePaddle/Paddle/pull/61236), [#61244](https://github.com/PaddlePaddle/Paddle/pull/61244), [#61242](https://github.com/PaddlePaddle/Paddle/pull/61242), [#61263](https://github.com/PaddlePaddle/Paddle/pull/61263), [#61370](https://github.com/PaddlePaddle/Paddle/pull/61370), [#61410](https://github.com/PaddlePaddle/Paddle/pull/61410), [#61480](https://github.com/PaddlePaddle/Paddle/pull/61480), [#61522](https://github.com/PaddlePaddle/Paddle/pull/61522), [#61540](https://github.com/PaddlePaddle/Paddle/pull/61540), [#61520](https://github.com/PaddlePaddle/Paddle/pull/61520), [#61625](https://github.com/PaddlePaddle/Paddle/pull/61625), [#61700](https://github.com/PaddlePaddle/Paddle/pull/61700), [#61708](https://github.com/PaddlePaddle/Paddle/pull/61708), [#61736](https://github.com/PaddlePaddle/Paddle/pull/61736), [#61889](https://github.com/PaddlePaddle/Paddle/pull/61889), [#61952](https://github.com/PaddlePaddle/Paddle/pull/61952), [#62033](https://github.com/PaddlePaddle/Paddle/pull/62033), [#62637](https://github.com/PaddlePaddle/Paddle/pull/62637), [#62777](https://github.com/PaddlePaddle/Paddle/pull/62777), [#62779](https://github.com/PaddlePaddle/Paddle/pull/62779), [#63226](https://github.com/PaddlePaddle/Paddle/pull/63226), [#63287](https://github.com/PaddlePaddle/Paddle/pull/63287), [#63398](https://github.com/PaddlePaddle/Paddle/pull/63398), [#63431](https://github.com/PaddlePaddle/Paddle/pull/63431), [#64000](https://github.com/PaddlePaddle/Paddle/pull/64000), [#64058](https://github.com/PaddlePaddle/Paddle/pull/64058), [#64059](https://github.com/PaddlePaddle/Paddle/pull/64059), [#64063](https://github.com/PaddlePaddle/Paddle/pull/64063), [#64066](https://github.com/PaddlePaddle/Paddle/pull/64066), [#64089](https://github.com/PaddlePaddle/Paddle/pull/64089), [#64170](https://github.com/PaddlePaddle/Paddle/pull/64170), [#64235](https://github.com/PaddlePaddle/Paddle/pull/64235), [#64237](https://github.com/PaddlePaddle/Paddle/pull/64237), [#64243](https://github.com/PaddlePaddle/Paddle/pull/64243), [#64242](https://github.com/PaddlePaddle/Paddle/pull/64242), [#64286](https://github.com/PaddlePaddle/Paddle/pull/64286), [#64322](https://github.com/PaddlePaddle/Paddle/pull/64322), [#64317](https://github.com/PaddlePaddle/Paddle/pull/64317), [#64490](https://github.com/PaddlePaddle/Paddle/pull/64490), [#60138](https://github.com/PaddlePaddle/Paddle/pull/60138), [#62384](https://github.com/PaddlePaddle/Paddle/pull/62384), [#59702](https://github.com/PaddlePaddle/Paddle/pull/59702), [#60341](https://github.com/PaddlePaddle/Paddle/pull/60341), [#60636](https://github.com/PaddlePaddle/Paddle/pull/60636), [#60714](https://github.com/PaddlePaddle/Paddle/pull/60714), [#60716](https://github.com/PaddlePaddle/Paddle/pull/60716), [#60700](https://github.com/PaddlePaddle/Paddle/pull/60700), [#60702](https://github.com/PaddlePaddle/Paddle/pull/60702), [#60704](https://github.com/PaddlePaddle/Paddle/pull/60704), [#60715](https://github.com/PaddlePaddle/Paddle/pull/60715), [#60713](https://github.com/PaddlePaddle/Paddle/pull/60713), [#60711](https://github.com/PaddlePaddle/Paddle/pull/60711), [#60724](https://github.com/PaddlePaddle/Paddle/pull/60724), [#60803](https://github.com/PaddlePaddle/Paddle/pull/60803), [#61331](https://github.com/PaddlePaddle/Paddle/pull/61331), [#63286](https://github.com/PaddlePaddle/Paddle/pull/63286), [#60473](https://github.com/PaddlePaddle/Paddle/pull/60473), [#61046](https://github.com/PaddlePaddle/Paddle/pull/61046), [#61859](https://github.com/PaddlePaddle/Paddle/pull/61859), [#60675](https://github.com/PaddlePaddle/Paddle/pull/60675), [#60719](https://github.com/PaddlePaddle/Paddle/pull/60719), [#62863](https://github.com/PaddlePaddle/Paddle/pull/62863), [#63013](https://github.com/PaddlePaddle/Paddle/pull/63013), [#61293](https://github.com/PaddlePaddle/Paddle/pull/61293), [#62781](https://github.com/PaddlePaddle/Paddle/pull/62781), [#62935](https://github.com/PaddlePaddle/Paddle/pull/62935), [#63014](https://github.com/PaddlePaddle/Paddle/pull/63014), [#64203](https://github.com/PaddlePaddle/Paddle/pull/64203), [#63349](https://github.com/PaddlePaddle/Paddle/pull/63349), [#59572](https://github.com/PaddlePaddle/Paddle/pull/59572), [#59911](https://github.com/PaddlePaddle/Paddle/pull/59911), [#59861](https://github.com/PaddlePaddle/Paddle/pull/59861), [#60014](https://github.com/PaddlePaddle/Paddle/pull/60014), [#59913](https://github.com/PaddlePaddle/Paddle/pull/59913), [#58889](https://github.com/PaddlePaddle/Paddle/pull/58889), [#60114](https://github.com/PaddlePaddle/Paddle/pull/60114), [#59928](https://github.com/PaddlePaddle/Paddle/pull/59928), [#60180](https://github.com/PaddlePaddle/Paddle/pull/60180), [#60168](https://github.com/PaddlePaddle/Paddle/pull/60168), [#60166](https://github.com/PaddlePaddle/Paddle/pull/60166), [#60250](https://github.com/PaddlePaddle/Paddle/pull/60250), [#60247](https://github.com/PaddlePaddle/Paddle/pull/60247), [#60172](https://github.com/PaddlePaddle/Paddle/pull/60172), [#59661](https://github.com/PaddlePaddle/Paddle/pull/59661), [#58880](https://github.com/PaddlePaddle/Paddle/pull/58880), [#60291](https://github.com/PaddlePaddle/Paddle/pull/60291), [#58881](https://github.com/PaddlePaddle/Paddle/pull/58881), [#58955](https://github.com/PaddlePaddle/Paddle/pull/58955), [#58684](https://github.com/PaddlePaddle/Paddle/pull/58684), [#58708](https://github.com/PaddlePaddle/Paddle/pull/58708), [#60323](https://github.com/PaddlePaddle/Paddle/pull/60323), [#58762](https://github.com/PaddlePaddle/Paddle/pull/58762), [#60048](https://github.com/PaddlePaddle/Paddle/pull/60048), [#60345](https://github.com/PaddlePaddle/Paddle/pull/60345), [#60325](https://github.com/PaddlePaddle/Paddle/pull/60325), [#59627](https://github.com/PaddlePaddle/Paddle/pull/59627), [#60416](https://github.com/PaddlePaddle/Paddle/pull/60416), [#60434](https://github.com/PaddlePaddle/Paddle/pull/60434), [#59801](https://github.com/PaddlePaddle/Paddle/pull/59801), [#60619](https://github.com/PaddlePaddle/Paddle/pull/60619), [#60445](https://github.com/PaddlePaddle/Paddle/pull/60445), [#60666](https://github.com/PaddlePaddle/Paddle/pull/60666), [#60353](https://github.com/PaddlePaddle/Paddle/pull/60353), [#60733](https://github.com/PaddlePaddle/Paddle/pull/60733), [#60693](https://github.com/PaddlePaddle/Paddle/pull/60693), [#60350](https://github.com/PaddlePaddle/Paddle/pull/60350), [#61096](https://github.com/PaddlePaddle/Paddle/pull/61096), [#61121](https://github.com/PaddlePaddle/Paddle/pull/61121), [#61164](https://github.com/PaddlePaddle/Paddle/pull/61164), [#62054](https://github.com/PaddlePaddle/Paddle/pull/62054), [#62136](https://github.com/PaddlePaddle/Paddle/pull/62136), [#62508](https://github.com/PaddlePaddle/Paddle/pull/62508), [#62988](https://github.com/PaddlePaddle/Paddle/pull/62988), [#63472](https://github.com/PaddlePaddle/Paddle/pull/63472), [#60193](https://github.com/PaddlePaddle/Paddle/pull/60193), [#60197](https://github.com/PaddlePaddle/Paddle/pull/60197), [#60198](https://github.com/PaddlePaddle/Paddle/pull/60198), [#60346](https://github.com/PaddlePaddle/Paddle/pull/60346), [#60318](https://github.com/PaddlePaddle/Paddle/pull/60318), [#60645](https://github.com/PaddlePaddle/Paddle/pull/60645), [#60650](https://github.com/PaddlePaddle/Paddle/pull/60650), [#60660](https://github.com/PaddlePaddle/Paddle/pull/60660), [#60706](https://github.com/PaddlePaddle/Paddle/pull/60706), [#60799](https://github.com/PaddlePaddle/Paddle/pull/60799), [#60837](https://github.com/PaddlePaddle/Paddle/pull/60837), [#60817](https://github.com/PaddlePaddle/Paddle/pull/60817), [#60820](https://github.com/PaddlePaddle/Paddle/pull/60820), [#60894](https://github.com/PaddlePaddle/Paddle/pull/60894), [#61079](https://github.com/PaddlePaddle/Paddle/pull/61079), [#61087](https://github.com/PaddlePaddle/Paddle/pull/61087), [#61073](https://github.com/PaddlePaddle/Paddle/pull/61073), [#61072](https://github.com/PaddlePaddle/Paddle/pull/61072), [#61127](https://github.com/PaddlePaddle/Paddle/pull/61127), [#61097](https://github.com/PaddlePaddle/Paddle/pull/61097), [#61365](https://github.com/PaddlePaddle/Paddle/pull/61365), [#61456](https://github.com/PaddlePaddle/Paddle/pull/61456), [#61846](https://github.com/PaddlePaddle/Paddle/pull/61846), [#62217](https://github.com/PaddlePaddle/Paddle/pull/62217), [#62519](https://github.com/PaddlePaddle/Paddle/pull/62519), [#62881](https://github.com/PaddlePaddle/Paddle/pull/62881), [#62880](https://github.com/PaddlePaddle/Paddle/pull/62880), [#59723](https://github.com/PaddlePaddle/Paddle/pull/59723), [#59722](https://github.com/PaddlePaddle/Paddle/pull/59722), [#59797](https://github.com/PaddlePaddle/Paddle/pull/59797), [#59960](https://github.com/PaddlePaddle/Paddle/pull/59960), [#59761](https://github.com/PaddlePaddle/Paddle/pull/59761), [#59996](https://github.com/PaddlePaddle/Paddle/pull/59996), [#60009](https://github.com/PaddlePaddle/Paddle/pull/60009), [#58896](https://github.com/PaddlePaddle/Paddle/pull/58896), [#60051](https://github.com/PaddlePaddle/Paddle/pull/60051), [#60410](https://github.com/PaddlePaddle/Paddle/pull/60410), [#60420](https://github.com/PaddlePaddle/Paddle/pull/60420), [#60548](https://github.com/PaddlePaddle/Paddle/pull/60548), [#60575](https://github.com/PaddlePaddle/Paddle/pull/60575), [#60726](https://github.com/PaddlePaddle/Paddle/pull/60726), [#60809](https://github.com/PaddlePaddle/Paddle/pull/60809), [#61346](https://github.com/PaddlePaddle/Paddle/pull/61346), [#61222](https://github.com/PaddlePaddle/Paddle/pull/61222), [#61099](https://github.com/PaddlePaddle/Paddle/pull/61099), [#62254](https://github.com/PaddlePaddle/Paddle/pull/62254), [#62269](https://github.com/PaddlePaddle/Paddle/pull/62269), [#62362](https://github.com/PaddlePaddle/Paddle/pull/62362) -- 完善飞桨底层报错检查等机制,方便开发者调试。[#62571](https://github.com/PaddlePaddle/Paddle/pull/62571), [#62602](https://github.com/PaddlePaddle/Paddle/pull/62602), [#60903](https://github.com/PaddlePaddle/Paddle/pull/60903), [#64695](https://github.com/PaddlePaddle/Paddle/pull/64695), [#59907](https://github.com/PaddlePaddle/Paddle/pull/59907), [#62018](https://github.com/PaddlePaddle/Paddle/pull/62018), [#62839](https://github.com/PaddlePaddle/Paddle/pull/62839), [#60651](https://github.com/PaddlePaddle/Paddle/pull/60651), [#61488](https://github.com/PaddlePaddle/Paddle/pull/61488), [#64064](https://github.com/PaddlePaddle/Paddle/pull/64064), [#63192](https://github.com/PaddlePaddle/Paddle/pull/63192), [#63525](https://github.com/PaddlePaddle/Paddle/pull/63525)。 +- 修复参数配置导致的精度问题。 [#65814](https://github.com/PaddlePaddle/Paddle/pull/65814) +- 修复 save/load 相关 Bug。 [#65268](https://github.com/PaddlePaddle/Paddle/pull/65268), [#65359](https://github.com/PaddlePaddle/Paddle/pull/65359), [#65373](https://github.com/PaddlePaddle/Paddle/pull/65373), [#65314](https://github.com/PaddlePaddle/Paddle/pull/65314), [#65446](https://github.com/PaddlePaddle/Paddle/pull/65446), [#65476](https://github.com/PaddlePaddle/Paddle/pull/65476), [#66891](https://github.com/PaddlePaddle/Paddle/pull/66891), [#66931](https://github.com/PaddlePaddle/Paddle/pull/66931), [#65978](https://github.com/PaddlePaddle/Paddle/pull/65978), [#67654](https://github.com/PaddlePaddle/Paddle/pull/67654), [#67906](https://github.com/PaddlePaddle/Paddle/pull/67906), [#68723](https://github.com/PaddlePaddle/Paddle/pull/68723), [#71452](https://github.com/PaddlePaddle/Paddle/pull/71452), [#71457](https://github.com/PaddlePaddle/Paddle/pull/71457), [#67819](https://github.com/PaddlePaddle/Paddle/pull/67819), [#68120](https://github.com/PaddlePaddle/Paddle/pull/68120), [#68300](https://github.com/PaddlePaddle/Paddle/pull/68300), [#68315](https://github.com/PaddlePaddle/Paddle/pull/68315), [#68743](https://github.com/PaddlePaddle/Paddle/pull/68743), [#68744](https://github.com/PaddlePaddle/Paddle/pull/68744), [#69585](https://github.com/PaddlePaddle/Paddle/pull/69585), [#71165](https://github.com/PaddlePaddle/Paddle/pull/71165), [#71400](https://github.com/PaddlePaddle/Paddle/pull/71400) +- 跳过/修复在 PIR 模式下的失败单测,包括 Windows、XPU 等场景。 [#65690](https://github.com/PaddlePaddle/Paddle/pull/65690), [#65759](https://github.com/PaddlePaddle/Paddle/pull/65759), [#65730](https://github.com/PaddlePaddle/Paddle/pull/65730), [#65760](https://github.com/PaddlePaddle/Paddle/pull/65760), [#65833](https://github.com/PaddlePaddle/Paddle/pull/65833), [#65834](https://github.com/PaddlePaddle/Paddle/pull/65834), [#65856](https://github.com/PaddlePaddle/Paddle/pull/65856), [#65886](https://github.com/PaddlePaddle/Paddle/pull/65886), [#65899](https://github.com/PaddlePaddle/Paddle/pull/65899), [#65932](https://github.com/PaddlePaddle/Paddle/pull/65932), [#65998](https://github.com/PaddlePaddle/Paddle/pull/65998), [#65953](https://github.com/PaddlePaddle/Paddle/pull/65953), [#65997](https://github.com/PaddlePaddle/Paddle/pull/65997), [#66061](https://github.com/PaddlePaddle/Paddle/pull/66061), [#66111](https://github.com/PaddlePaddle/Paddle/pull/66111), [#66137](https://github.com/PaddlePaddle/Paddle/pull/66137), [#66073](https://github.com/PaddlePaddle/Paddle/pull/66073), [#66203](https://github.com/PaddlePaddle/Paddle/pull/66203), [#66227](https://github.com/PaddlePaddle/Paddle/pull/66227), [#65744](https://github.com/PaddlePaddle/Paddle/pull/65744), [#66234](https://github.com/PaddlePaddle/Paddle/pull/66234), [#67487](https://github.com/PaddlePaddle/Paddle/pull/67487), [#67561](https://github.com/PaddlePaddle/Paddle/pull/67561), [#67584](https://github.com/PaddlePaddle/Paddle/pull/67584), [#67742](https://github.com/PaddlePaddle/Paddle/pull/67742), [#69832](https://github.com/PaddlePaddle/Paddle/pull/69832), [#65885](https://github.com/PaddlePaddle/Paddle/pull/65885), [#66709](https://github.com/PaddlePaddle/Paddle/pull/66709), [#66734](https://github.com/PaddlePaddle/Paddle/pull/66734), [#66959](https://github.com/PaddlePaddle/Paddle/pull/66959), [#67399](https://github.com/PaddlePaddle/Paddle/pull/67399), [#67389](https://github.com/PaddlePaddle/Paddle/pull/67389), [#67230](https://github.com/PaddlePaddle/Paddle/pull/67230), [#67403](https://github.com/PaddlePaddle/Paddle/pull/67403), [#67619](https://github.com/PaddlePaddle/Paddle/pull/67619), [#67662](https://github.com/PaddlePaddle/Paddle/pull/67662), [#67902](https://github.com/PaddlePaddle/Paddle/pull/67902), [#67382](https://github.com/PaddlePaddle/Paddle/pull/67382), [#67430](https://github.com/PaddlePaddle/Paddle/pull/67430), [#67517](https://github.com/PaddlePaddle/Paddle/pull/67517), [#67533](https://github.com/PaddlePaddle/Paddle/pull/67533), [#67573](https://github.com/PaddlePaddle/Paddle/pull/67573), [#67468](https://github.com/PaddlePaddle/Paddle/pull/67468), [#67640](https://github.com/PaddlePaddle/Paddle/pull/67640), [#67667](https://github.com/PaddlePaddle/Paddle/pull/67667), [#67716](https://github.com/PaddlePaddle/Paddle/pull/67716), [#68386](https://github.com/PaddlePaddle/Paddle/pull/68386), [#67234](https://github.com/PaddlePaddle/Paddle/pull/67234), [#67266](https://github.com/PaddlePaddle/Paddle/pull/67266), [#67362](https://github.com/PaddlePaddle/Paddle/pull/67362), [#67631](https://github.com/PaddlePaddle/Paddle/pull/67631), [#68081](https://github.com/PaddlePaddle/Paddle/pull/68081) +- 修复动态图相关 Bug。 [#65619](https://github.com/PaddlePaddle/Paddle/pull/65619), [#69163](https://github.com/PaddlePaddle/Paddle/pull/69163), [#68862](https://github.com/PaddlePaddle/Paddle/pull/68862), [#68164](https://github.com/PaddlePaddle/Paddle/pull/68164), [#69867](https://github.com/PaddlePaddle/Paddle/pull/69867) +- 修复控制流相关 Bug。 [#65722](https://github.com/PaddlePaddle/Paddle/pull/65722), [#70181](https://github.com/PaddlePaddle/Paddle/pull/70181) +- 修复 kernel 运算相关 Bug,包括运算位置、空指针等。 [#66334](https://github.com/PaddlePaddle/Paddle/pull/66334), [#67931](https://github.com/PaddlePaddle/Paddle/pull/67931), [#70353](https://github.com/PaddlePaddle/Paddle/pull/70353) +- 修复 Amp 相关 Bug。 [#66778](https://github.com/PaddlePaddle/Paddle/pull/66778), [#67582](https://github.com/PaddlePaddle/Paddle/pull/67582), [#67704](https://github.com/PaddlePaddle/Paddle/pull/67704), [#68655](https://github.com/PaddlePaddle/Paddle/pull/68655) +- 修复 CINN 相关 Bug。 [#69577](https://github.com/PaddlePaddle/Paddle/pull/69577), [#71101](https://github.com/PaddlePaddle/Paddle/pull/71101), [#71387](https://github.com/PaddlePaddle/Paddle/pull/71387), [#71401](https://github.com/PaddlePaddle/Paddle/pull/71401) +- 修复动转静相关 Bug。 [#67617](https://github.com/PaddlePaddle/Paddle/pull/67617), [#67936](https://github.com/PaddlePaddle/Paddle/pull/67936), [#68938](https://github.com/PaddlePaddle/Paddle/pull/68938), [#68734](https://github.com/PaddlePaddle/Paddle/pull/68734), [#69010](https://github.com/PaddlePaddle/Paddle/pull/69010), [#69408](https://github.com/PaddlePaddle/Paddle/pull/69408), [#69461](https://github.com/PaddlePaddle/Paddle/pull/69461), [#69699](https://github.com/PaddlePaddle/Paddle/pull/69699), [#69774](https://github.com/PaddlePaddle/Paddle/pull/69774), [#69803](https://github.com/PaddlePaddle/Paddle/pull/69803), [#69853](https://github.com/PaddlePaddle/Paddle/pull/69853), [#70510](https://github.com/PaddlePaddle/Paddle/pull/70510), [#70830](https://github.com/PaddlePaddle/Paddle/pull/70830), [#70904](https://github.com/PaddlePaddle/Paddle/pull/70904), [#70913](https://github.com/PaddlePaddle/Paddle/pull/70913), [#71040](https://github.com/PaddlePaddle/Paddle/pull/71040), [#71048](https://github.com/PaddlePaddle/Paddle/pull/71048), [#71106](https://github.com/PaddlePaddle/Paddle/pull/71106), [#71201](https://github.com/PaddlePaddle/Paddle/pull/71201), [#71216](https://github.com/PaddlePaddle/Paddle/pull/71216), [#71223](https://github.com/PaddlePaddle/Paddle/pull/71223), [#71296](https://github.com/PaddlePaddle/Paddle/pull/71296), [#71385](https://github.com/PaddlePaddle/Paddle/pull/71385), [#71505](https://github.com/PaddlePaddle/Paddle/pull/71505), [#66934](https://github.com/PaddlePaddle/Paddle/pull/66934), [#71096](https://github.com/PaddlePaddle/Paddle/pull/71096), [#71144](https://github.com/PaddlePaddle/Paddle/pull/71144), [#71430](https://github.com/PaddlePaddle/Paddle/pull/71430), [#71437](https://github.com/PaddlePaddle/Paddle/pull/71437), [#71473](https://github.com/PaddlePaddle/Paddle/pull/71473), [#71412](https://github.com/PaddlePaddle/Paddle/pull/71412), [#65648](https://github.com/PaddlePaddle/Paddle/pull/65648), [#67853](https://github.com/PaddlePaddle/Paddle/pull/67853), [#66543](https://github.com/PaddlePaddle/Paddle/pull/66543), [#68229](https://github.com/PaddlePaddle/Paddle/pull/68229), [#70846](https://github.com/PaddlePaddle/Paddle/pull/70846), [#67532](https://github.com/PaddlePaddle/Paddle/pull/67532) +- 修复其他 Bug,包括反向传播梯度计算、内存拷贝、执行器报错等。 [#65493](https://github.com/PaddlePaddle/Paddle/pull/65493), [#65678](https://github.com/PaddlePaddle/Paddle/pull/65678), [#65673](https://github.com/PaddlePaddle/Paddle/pull/65673), [#65794](https://github.com/PaddlePaddle/Paddle/pull/65794), [#66358](https://github.com/PaddlePaddle/Paddle/pull/66358), [#66875](https://github.com/PaddlePaddle/Paddle/pull/66875), [#67339](https://github.com/PaddlePaddle/Paddle/pull/67339), [#67465](https://github.com/PaddlePaddle/Paddle/pull/67465), [#67754](https://github.com/PaddlePaddle/Paddle/pull/67754), [#67835](https://github.com/PaddlePaddle/Paddle/pull/67835), [#67892](https://github.com/PaddlePaddle/Paddle/pull/67892), [#67967](https://github.com/PaddlePaddle/Paddle/pull/67967), [#67952](https://github.com/PaddlePaddle/Paddle/pull/67952), [#68036](https://github.com/PaddlePaddle/Paddle/pull/68036), [#68063](https://github.com/PaddlePaddle/Paddle/pull/68063), [#68128](https://github.com/PaddlePaddle/Paddle/pull/68128), [#68151](https://github.com/PaddlePaddle/Paddle/pull/68151), [#68140](https://github.com/PaddlePaddle/Paddle/pull/68140), [#68167](https://github.com/PaddlePaddle/Paddle/pull/68167), [#68200](https://github.com/PaddlePaddle/Paddle/pull/68200), [#68325](https://github.com/PaddlePaddle/Paddle/pull/68325), [#68376](https://github.com/PaddlePaddle/Paddle/pull/68376), [#68539](https://github.com/PaddlePaddle/Paddle/pull/68539), [#68530](https://github.com/PaddlePaddle/Paddle/pull/68530), [#68637](https://github.com/PaddlePaddle/Paddle/pull/68637), [#68639](https://github.com/PaddlePaddle/Paddle/pull/68639), [#68688](https://github.com/PaddlePaddle/Paddle/pull/68688), [#68751](https://github.com/PaddlePaddle/Paddle/pull/68751), [#68806](https://github.com/PaddlePaddle/Paddle/pull/68806), [#68810](https://github.com/PaddlePaddle/Paddle/pull/68810), [#68779](https://github.com/PaddlePaddle/Paddle/pull/68779), [#68811](https://github.com/PaddlePaddle/Paddle/pull/68811), [#68844](https://github.com/PaddlePaddle/Paddle/pull/68844), [#68790](https://github.com/PaddlePaddle/Paddle/pull/68790), [#68870](https://github.com/PaddlePaddle/Paddle/pull/68870), [#68960](https://github.com/PaddlePaddle/Paddle/pull/68960), [#68999](https://github.com/PaddlePaddle/Paddle/pull/68999), [#69036](https://github.com/PaddlePaddle/Paddle/pull/69036), [#69188](https://github.com/PaddlePaddle/Paddle/pull/69188), [#69234](https://github.com/PaddlePaddle/Paddle/pull/69234), [#69375](https://github.com/PaddlePaddle/Paddle/pull/69375), [#69399](https://github.com/PaddlePaddle/Paddle/pull/69399), [#69538](https://github.com/PaddlePaddle/Paddle/pull/69538), [#69603](https://github.com/PaddlePaddle/Paddle/pull/69603), [#69633](https://github.com/PaddlePaddle/Paddle/pull/69633), [#69765](https://github.com/PaddlePaddle/Paddle/pull/69765), [#69768](https://github.com/PaddlePaddle/Paddle/pull/69768), [#69821](https://github.com/PaddlePaddle/Paddle/pull/69821), [#70091](https://github.com/PaddlePaddle/Paddle/pull/70091), [#70123](https://github.com/PaddlePaddle/Paddle/pull/70123), [#70147](https://github.com/PaddlePaddle/Paddle/pull/70147), [#70201](https://github.com/PaddlePaddle/Paddle/pull/70201), [#70198](https://github.com/PaddlePaddle/Paddle/pull/70198), [#69815](https://github.com/PaddlePaddle/Paddle/pull/69815), [#70420](https://github.com/PaddlePaddle/Paddle/pull/70420), [#70377](https://github.com/PaddlePaddle/Paddle/pull/70377), [#70552](https://github.com/PaddlePaddle/Paddle/pull/70552), [#70545](https://github.com/PaddlePaddle/Paddle/pull/70545), [#70595](https://github.com/PaddlePaddle/Paddle/pull/70595), [#70836](https://github.com/PaddlePaddle/Paddle/pull/70836), [#70771](https://github.com/PaddlePaddle/Paddle/pull/70771), [#70922](https://github.com/PaddlePaddle/Paddle/pull/70922), [#70969](https://github.com/PaddlePaddle/Paddle/pull/70969), [#70926](https://github.com/PaddlePaddle/Paddle/pull/70926), [#71117](https://github.com/PaddlePaddle/Paddle/pull/71117), [#71151](https://github.com/PaddlePaddle/Paddle/pull/71151), [#71194](https://github.com/PaddlePaddle/Paddle/pull/71194), [#71234](https://github.com/PaddlePaddle/Paddle/pull/71234), [#71339](https://github.com/PaddlePaddle/Paddle/pull/71339), [#71445](https://github.com/PaddlePaddle/Paddle/pull/71445), [#66350](https://github.com/PaddlePaddle/Paddle/pull/66350), [#66533](https://github.com/PaddlePaddle/Paddle/pull/66533), [#66622](https://github.com/PaddlePaddle/Paddle/pull/66622), [#67721](https://github.com/PaddlePaddle/Paddle/pull/67721), [#67700](https://github.com/PaddlePaddle/Paddle/pull/67700), [#69207](https://github.com/PaddlePaddle/Paddle/pull/69207), [#69615](https://github.com/PaddlePaddle/Paddle/pull/69615), [#69785](https://github.com/PaddlePaddle/Paddle/pull/69785), [#67805](https://github.com/PaddlePaddle/Paddle/pull/67805) + +### 功能优化 + +- 支持 save/load。 [#65296](https://github.com/PaddlePaddle/Paddle/pull/65296), [#65671](https://github.com/PaddlePaddle/Paddle/pull/65671), [#66231](https://github.com/PaddlePaddle/Paddle/pull/66231), [#66185](https://github.com/PaddlePaddle/Paddle/pull/66185), [#66722](https://github.com/PaddlePaddle/Paddle/pull/66722), [#66863](https://github.com/PaddlePaddle/Paddle/pull/66863), [#67057](https://github.com/PaddlePaddle/Paddle/pull/67057), [#68101](https://github.com/PaddlePaddle/Paddle/pull/68101), [#68628](https://github.com/PaddlePaddle/Paddle/pull/68628), [#66359](https://github.com/PaddlePaddle/Paddle/pull/66359), [#68481](https://github.com/PaddlePaddle/Paddle/pull/68481) +- 优化自定义算子编译流程。 [#67615](https://github.com/PaddlePaddle/Paddle/pull/67615), [#67659](https://github.com/PaddlePaddle/Paddle/pull/67659) +- 支持组合算子。 [#69121](https://github.com/PaddlePaddle/Paddle/pull/69121), [#69144](https://github.com/PaddlePaddle/Paddle/pull/69144), [#70204](https://github.com/PaddlePaddle/Paddle/pull/70204), [#71098](https://github.com/PaddlePaddle/Paddle/pull/71098), [#71335](https://github.com/PaddlePaddle/Paddle/pull/71335) +- 支持 CINN 编译器执行。 [#69589](https://github.com/PaddlePaddle/Paddle/pull/69589), [#70115](https://github.com/PaddlePaddle/Paddle/pull/70115) +- 支持 custom device。 [#70909](https://github.com/PaddlePaddle/Paddle/pull/70909), [#71294](https://github.com/PaddlePaddle/Paddle/pull/71294), [#71362](https://github.com/PaddlePaddle/Paddle/pull/71362), [#71010](https://github.com/PaddlePaddle/Paddle/pull/71010), [#71036](https://github.com/PaddlePaddle/Paddle/pull/71036), [#70637](https://github.com/PaddlePaddle/Paddle/pull/70637), [#71085](https://github.com/PaddlePaddle/Paddle/pull/71085) +- 其他场景的执行支持。 [#65050](https://github.com/PaddlePaddle/Paddle/pull/65050), [#65664](https://github.com/PaddlePaddle/Paddle/pull/65664), [#65741](https://github.com/PaddlePaddle/Paddle/pull/65741), [#65786](https://github.com/PaddlePaddle/Paddle/pull/65786), [#65499](https://github.com/PaddlePaddle/Paddle/pull/65499), [#66441](https://github.com/PaddlePaddle/Paddle/pull/66441), [#67668](https://github.com/PaddlePaddle/Paddle/pull/67668), [#68199](https://github.com/PaddlePaddle/Paddle/pull/68199), [#69088](https://github.com/PaddlePaddle/Paddle/pull/69088), [#70199](https://github.com/PaddlePaddle/Paddle/pull/70199), [#70308](https://github.com/PaddlePaddle/Paddle/pull/70308), [#70709](https://github.com/PaddlePaddle/Paddle/pull/70709), [#70937](https://github.com/PaddlePaddle/Paddle/pull/70937), [#71066](https://github.com/PaddlePaddle/Paddle/pull/71066), [#71079](https://github.com/PaddlePaddle/Paddle/pull/71079), [#71121](https://github.com/PaddlePaddle/Paddle/pull/71121), [#71136](https://github.com/PaddlePaddle/Paddle/pull/71136), [#71205](https://github.com/PaddlePaddle/Paddle/pull/71205) + +### 新特性 + +- SOT 适配 Python 3.13 版本字节码,支持 Python 3.13 下以 SOT 模式转静。[#68071](https://github.com/PaddlePaddle/Paddle/pull/68071), [#69126](https://github.com/PaddlePaddle/Paddle/pull/69126), [#69131](https://github.com/PaddlePaddle/Paddle/pull/69131), [#69196](https://github.com/PaddlePaddle/Paddle/pull/69196), [#69232](https://github.com/PaddlePaddle/Paddle/pull/69232), [#69253](https://github.com/PaddlePaddle/Paddle/pull/69253), [#69267](https://github.com/PaddlePaddle/Paddle/pull/69267), [#69412](https://github.com/PaddlePaddle/Paddle/pull/69412), [#69431](https://github.com/PaddlePaddle/Paddle/pull/69431), [#69432](https://github.com/PaddlePaddle/Paddle/pull/69432), [#69436](https://github.com/PaddlePaddle/Paddle/pull/69436), [#69557](https://github.com/PaddlePaddle/Paddle/pull/69557), [#69567](https://github.com/PaddlePaddle/Paddle/pull/69567), [#69700](https://github.com/PaddlePaddle/Paddle/pull/69700), [#69707](https://github.com/PaddlePaddle/Paddle/pull/69707), [#69735](https://github.com/PaddlePaddle/Paddle/pull/69735), [#69738](https://github.com/PaddlePaddle/Paddle/pull/69738), [#69744](https://github.com/PaddlePaddle/Paddle/pull/69744), [#69753](https://github.com/PaddlePaddle/Paddle/pull/69753), [#69887](https://github.com/PaddlePaddle/Paddle/pull/69887), [#69920](https://github.com/PaddlePaddle/Paddle/pull/69920), [#69950](https://github.com/PaddlePaddle/Paddle/pull/69950), [#70319](https://github.com/PaddlePaddle/Paddle/pull/70319), [#70927](https://github.com/PaddlePaddle/Paddle/pull/70927) +- 适配 custom device。 [#68061](https://github.com/PaddlePaddle/Paddle/pull/68061), [#68836](https://github.com/PaddlePaddle/Paddle/pull/68836), [#70366](https://github.com/PaddlePaddle/Paddle/pull/70366), [#70549](https://github.com/PaddlePaddle/Paddle/pull/70549) +- 适配 PIR 前向执行。 [#65335](https://github.com/PaddlePaddle/Paddle/pull/65335) +- 适配 save/load。 [#67910](https://github.com/PaddlePaddle/Paddle/pull/67910) +- 适配 pylayer。 [#70335](https://github.com/PaddlePaddle/Paddle/pull/70335) +- 适配 lazy_init。 [#67379](https://github.com/PaddlePaddle/Paddle/pull/67379), [#67467](https://github.com/PaddlePaddle/Paddle/pull/67467) +- 优化 PIR 下的逻辑。 [#67961](https://github.com/PaddlePaddle/Paddle/pull/67961) +- 其他场景的支持。 [#68344](https://github.com/PaddlePaddle/Paddle/pull/68344), [#70071](https://github.com/PaddlePaddle/Paddle/pull/70071), [#70291](https://github.com/PaddlePaddle/Paddle/pull/70291), [#70752](https://github.com/PaddlePaddle/Paddle/pull/70752), [#70812](https://github.com/PaddlePaddle/Paddle/pull/70812), [#71033](https://github.com/PaddlePaddle/Paddle/pull/71033) + +### 普通用户无关改动 + +- SOT 调试体验优化,开发效率提升。[#67560](https://github.com/PaddlePaddle/Paddle/pull/67560), [#69072](https://github.com/PaddlePaddle/Paddle/pull/69072), [#69837](https://github.com/PaddlePaddle/Paddle/pull/69837), [#70134](https://github.com/PaddlePaddle/Paddle/pull/70134), [#70387](https://github.com/PaddlePaddle/Paddle/pull/70387), [#70740](https://github.com/PaddlePaddle/Paddle/pull/70740), [#71118](https://github.com/PaddlePaddle/Paddle/pull/71118), [#71268](https://github.com/PaddlePaddle/Paddle/pull/71268), [#71275](https://github.com/PaddlePaddle/Paddle/pull/71275), [#71458](https://github.com/PaddlePaddle/Paddle/pull/71458), [#71460](https://github.com/PaddlePaddle/Paddle/pull/71460) +- 其他与用户使用无关的改动。 [#65393](https://github.com/PaddlePaddle/Paddle/pull/65393), [#65795](https://github.com/PaddlePaddle/Paddle/pull/65795), [#65799](https://github.com/PaddlePaddle/Paddle/pull/65799), [#65911](https://github.com/PaddlePaddle/Paddle/pull/65911), [#65977](https://github.com/PaddlePaddle/Paddle/pull/65977), [#66982](https://github.com/PaddlePaddle/Paddle/pull/66982), [#67563](https://github.com/PaddlePaddle/Paddle/pull/67563), [#68761](https://github.com/PaddlePaddle/Paddle/pull/68761), [#68909](https://github.com/PaddlePaddle/Paddle/pull/68909), [#69130](https://github.com/PaddlePaddle/Paddle/pull/69130), [#69233](https://github.com/PaddlePaddle/Paddle/pull/69233), [#69956](https://github.com/PaddlePaddle/Paddle/pull/69956), [#71142](https://github.com/PaddlePaddle/Paddle/pull/71142) + +### 安全问题 -### 漏洞修复 -- 修复潜在的安全漏洞。[#59957](https://github.com/PaddlePaddle/Paddle/pull/59957), [#61032](https://github.com/PaddlePaddle/Paddle/pull/61032), [#61356](https://github.com/PaddlePaddle/Paddle/pull/61356), [#61573](https://github.com/PaddlePaddle/Paddle/pull/61573), [#61671](https://github.com/PaddlePaddle/Paddle/pull/61671), [#62345](https://github.com/PaddlePaddle/Paddle/pull/62345), [#60097](https://github.com/PaddlePaddle/Paddle/pull/60097), [#61161](https://github.com/PaddlePaddle/Paddle/pull/61161), [#61294](https://github.com/PaddlePaddle/Paddle/pull/61294), [#61349](https://github.com/PaddlePaddle/Paddle/pull/61349), [#61344](https://github.com/PaddlePaddle/Paddle/pull/61344), [#61162](https://github.com/PaddlePaddle/Paddle/pull/61162), [#61285](https://github.com/PaddlePaddle/Paddle/pull/61285), [#61826](https://github.com/PaddlePaddle/Paddle/pull/61826), [#59967](https://github.com/PaddlePaddle/Paddle/pull/59967), [#59976](https://github.com/PaddlePaddle/Paddle/pull/59976), [#59979](https://github.com/PaddlePaddle/Paddle/pull/59979)[#60527](https://github.com/PaddlePaddle/Paddle/pull/60527),[#60646](https://github.com/PaddlePaddle/Paddle/pull/60646),[#61827](https://github.com/PaddlePaddle/Paddle/pull/61827) +- 为 IR(中间表示)的保存/加载操作引入了审批规则,以增强模型序列化过程中的安全性和治理。 [#65737](https://github.com/PaddlePaddle/Paddle/pull/65737) + +### 其他 + +- Sparse API 迁移。 [#66139](https://github.com/PaddlePaddle/Paddle/pull/66139), [#66319](https://github.com/PaddlePaddle/Paddle/pull/66319), [#66866](https://github.com/PaddlePaddle/Paddle/pull/66866) +- PIR 功能扩展。 [#67966](https://github.com/PaddlePaddle/Paddle/pull/67966), [#69909](https://github.com/PaddlePaddle/Paddle/pull/69909) +- 迁移文件位置。 [#66477](https://github.com/PaddlePaddle/Paddle/pull/66477), [#66824](https://github.com/PaddlePaddle/Paddle/pull/66824), [#67592](https://github.com/PaddlePaddle/Paddle/pull/67592) +- 日志添加。 [#68382](https://github.com/PaddlePaddle/Paddle/pull/68382), [#70506](https://github.com/PaddlePaddle/Paddle/pull/70506) +- 默认打开 PIR。 [#68278](https://github.com/PaddlePaddle/Paddle/pull/68278) +- 头文件整理。 [#68422](https://github.com/PaddlePaddle/Paddle/pull/68422), [#68471](https://github.com/PaddlePaddle/Paddle/pull/68471) +- 编译优化。 [#67831](https://github.com/PaddlePaddle/Paddle/pull/67831), [#67821](https://github.com/PaddlePaddle/Paddle/pull/67821), [#68717](https://github.com/PaddlePaddle/Paddle/pull/68717) +- 用 guard 管理相关 test。 [#67816](https://github.com/PaddlePaddle/Paddle/pull/67816), [#67827](https://github.com/PaddlePaddle/Paddle/pull/67827), [#67989](https://github.com/PaddlePaddle/Paddle/pull/67989) +- 拼写错误修复。 [#70784](https://github.com/PaddlePaddle/Paddle/pull/70784), [#70787](https://github.com/PaddlePaddle/Paddle/pull/70787) +- 检查 cuda 错误。 [#70399](https://github.com/PaddlePaddle/Paddle/pull/70399) + +### 开发者 + +- 动转静功能修复,提升整图转换成功率,优化推理导出体验。[#65291](https://github.com/PaddlePaddle/Paddle/pull/65291), [#66153](https://github.com/PaddlePaddle/Paddle/pull/66153), [#66379](https://github.com/PaddlePaddle/Paddle/pull/66379), [#66557](https://github.com/PaddlePaddle/Paddle/pull/66557), [#67021](https://github.com/PaddlePaddle/Paddle/pull/67021), [#67482](https://github.com/PaddlePaddle/Paddle/pull/67482), [#67495](https://github.com/PaddlePaddle/Paddle/pull/67495), [#67981](https://github.com/PaddlePaddle/Paddle/pull/67981), [#68030](https://github.com/PaddlePaddle/Paddle/pull/68030), [#68078](https://github.com/PaddlePaddle/Paddle/pull/68078), [#68328](https://github.com/PaddlePaddle/Paddle/pull/68328), [#68442](https://github.com/PaddlePaddle/Paddle/pull/68442), [#68679](https://github.com/PaddlePaddle/Paddle/pull/68679), [#68850](https://github.com/PaddlePaddle/Paddle/pull/68850), [#68892](https://github.com/PaddlePaddle/Paddle/pull/68892), [#68991](https://github.com/PaddlePaddle/Paddle/pull/68991), [#69043](https://github.com/PaddlePaddle/Paddle/pull/69043), [#69097](https://github.com/PaddlePaddle/Paddle/pull/69097), [#69210](https://github.com/PaddlePaddle/Paddle/pull/69210), [#69295](https://github.com/PaddlePaddle/Paddle/pull/69295), [#69428](https://github.com/PaddlePaddle/Paddle/pull/69428), [#69518](https://github.com/PaddlePaddle/Paddle/pull/69518), [#69642](https://github.com/PaddlePaddle/Paddle/pull/69642), [#69940](https://github.com/PaddlePaddle/Paddle/pull/69940), [#70118](https://github.com/PaddlePaddle/Paddle/pull/70118), [#70169](https://github.com/PaddlePaddle/Paddle/pull/70169), [#70218](https://github.com/PaddlePaddle/Paddle/pull/70218), [#70287](https://github.com/PaddlePaddle/Paddle/pull/70287), [#70412](https://github.com/PaddlePaddle/Paddle/pull/70412), [#71099](https://github.com/PaddlePaddle/Paddle/pull/71099), [#71156](https://github.com/PaddlePaddle/Paddle/pull/71156), [#71193](https://github.com/PaddlePaddle/Paddle/pull/71193), [#71336](https://github.com/PaddlePaddle/Paddle/pull/71336), [#71463](https://github.com/PaddlePaddle/Paddle/pull/71463), [#71476](https://github.com/PaddlePaddle/Paddle/pull/71476), [#71503](https://github.com/PaddlePaddle/Paddle/pull/71503) +- Inplace 策略升级。 [#65491](https://github.com/PaddlePaddle/Paddle/pull/65491) +- 控制流相关开发。 [#67251](https://github.com/PaddlePaddle/Paddle/pull/67251) +- 添加环境变量。 [#68467](https://github.com/PaddlePaddle/Paddle/pull/68467) +- 支持稀疏算子运算。 [#67111](https://github.com/PaddlePaddle/Paddle/pull/67111) +- 其他执行支持开发,包括逻辑优化、版本适配、添加单测等。 [#69241](https://github.com/PaddlePaddle/Paddle/pull/69241), [#69806](https://github.com/PaddlePaddle/Paddle/pull/69806), [#70768](https://github.com/PaddlePaddle/Paddle/pull/70768), [#66829](https://github.com/PaddlePaddle/Paddle/pull/66829), [#67110](https://github.com/PaddlePaddle/Paddle/pull/67110), [#67442](https://github.com/PaddlePaddle/Paddle/pull/67442), [#67041](https://github.com/PaddlePaddle/Paddle/pull/67041), [#67452](https://github.com/PaddlePaddle/Paddle/pull/67452), [#69061](https://github.com/PaddlePaddle/Paddle/pull/69061), [#69307](https://github.com/PaddlePaddle/Paddle/pull/69307), [#68669](https://github.com/PaddlePaddle/Paddle/pull/68669), [#69829](https://github.com/PaddlePaddle/Paddle/pull/69829), [#70003](https://github.com/PaddlePaddle/Paddle/pull/70003), [#70443](https://github.com/PaddlePaddle/Paddle/pull/70443), [#70364](https://github.com/PaddlePaddle/Paddle/pull/70364), [#71495](https://github.com/PaddlePaddle/Paddle/pull/71495) + +### 性能优化 + +- 优化动态 shape 场景转静能力,降低构图次数,减少编译时间。[#65235](https://github.com/PaddlePaddle/Paddle/pull/65235), [#65477](https://github.com/PaddlePaddle/Paddle/pull/65477), [#65517](https://github.com/PaddlePaddle/Paddle/pull/65517), [#65882](https://github.com/PaddlePaddle/Paddle/pull/65882), [#66346](https://github.com/PaddlePaddle/Paddle/pull/66346), [#66746](https://github.com/PaddlePaddle/Paddle/pull/66746), [#67786](https://github.com/PaddlePaddle/Paddle/pull/67786), [#67876](https://github.com/PaddlePaddle/Paddle/pull/67876), [#68113](https://github.com/PaddlePaddle/Paddle/pull/68113), [#68302](https://github.com/PaddlePaddle/Paddle/pull/68302), [#68337](https://github.com/PaddlePaddle/Paddle/pull/68337), [#68616](https://github.com/PaddlePaddle/Paddle/pull/68616), [#69354](https://github.com/PaddlePaddle/Paddle/pull/69354), [#70009](https://github.com/PaddlePaddle/Paddle/pull/70009), [#70877](https://github.com/PaddlePaddle/Paddle/pull/70877) +- SOT 端到端性能优化,减少子图打断,降低调度开销,提升转静训练性能。[#67591](https://github.com/PaddlePaddle/Paddle/pull/67591), [#67746](https://github.com/PaddlePaddle/Paddle/pull/67746), [#67823](https://github.com/PaddlePaddle/Paddle/pull/67823), [#67890](https://github.com/PaddlePaddle/Paddle/pull/67890), [#67921](https://github.com/PaddlePaddle/Paddle/pull/67921), [#68031](https://github.com/PaddlePaddle/Paddle/pull/68031), [#68153](https://github.com/PaddlePaddle/Paddle/pull/68153), [#68729](https://github.com/PaddlePaddle/Paddle/pull/68729), [#69249](https://github.com/PaddlePaddle/Paddle/pull/69249), [#69263](https://github.com/PaddlePaddle/Paddle/pull/69263), [#69300](https://github.com/PaddlePaddle/Paddle/pull/69300), [#69313](https://github.com/PaddlePaddle/Paddle/pull/69313), [#69325](https://github.com/PaddlePaddle/Paddle/pull/69325), [#69353](https://github.com/PaddlePaddle/Paddle/pull/69353), [#69411](https://github.com/PaddlePaddle/Paddle/pull/69411), [#69506](https://github.com/PaddlePaddle/Paddle/pull/69506), [#69672](https://github.com/PaddlePaddle/Paddle/pull/69672), [#69746](https://github.com/PaddlePaddle/Paddle/pull/69746), [#69834](https://github.com/PaddlePaddle/Paddle/pull/69834), [#69836](https://github.com/PaddlePaddle/Paddle/pull/69836), [#69852](https://github.com/PaddlePaddle/Paddle/pull/69852), [#69975](https://github.com/PaddlePaddle/Paddle/pull/69975), [#70151](https://github.com/PaddlePaddle/Paddle/pull/70151), [#70293](https://github.com/PaddlePaddle/Paddle/pull/70293), [#70405](https://github.com/PaddlePaddle/Paddle/pull/70405), [#70851](https://github.com/PaddlePaddle/Paddle/pull/70851), [#71039](https://github.com/PaddlePaddle/Paddle/pull/71039), [#71254](https://github.com/PaddlePaddle/Paddle/pull/71254), [#71295](https://github.com/PaddlePaddle/Paddle/pull/71295), [#71298](https://github.com/PaddlePaddle/Paddle/pull/71298), [#71346](https://github.com/PaddlePaddle/Paddle/pull/71346), [#71377](https://github.com/PaddlePaddle/Paddle/pull/71377), [#71407](https://github.com/PaddlePaddle/Paddle/pull/71407) +- 优化动态 shape 场景性能。 [#68491](https://github.com/PaddlePaddle/Paddle/pull/68491), [#68629](https://github.com/PaddlePaddle/Paddle/pull/68629) +- 加速 PIR 执行器执行速度。 [#69513](https://github.com/PaddlePaddle/Paddle/pull/69513) +- 优化 PIR 保存和加载性能。 [#69683](https://github.com/PaddlePaddle/Paddle/pull/69683) +- 针对 device 进行优化。 [#69676](https://github.com/PaddlePaddle/Paddle/pull/69676) +- 清理输入输出冗余信息。 [#66278](https://github.com/PaddlePaddle/Paddle/pull/66278) ### 废弃功能 -- 清理废弃的执行器等逻辑,减少冗余代码。[#64822](https://github.com/PaddlePaddle/Paddle/pull/64822), [#60941](https://github.com/PaddlePaddle/Paddle/pull/60941) -## 3.编译器架构 -在 3.0 版本下,编译器架构进行了重要升级。基于 Shape Dialect 构建了符号自动推导和化简体系,支持符号表达、约束构建,支撑了编译器动态形状下的端到端执行。同时飞桨编译器 CINN 全新升级了子图自动融合和 Pass Pipline 机制,合并了动、静态形状的核心模块,合并迭代路径,架构清晰统一。在此版本下,编译器在 AST Compute、Schedule 策略、Tiling 等重要后端模块进行了重构,提升了编译器的通用优化能力,在飞桨产业套件模型子图和典型大模型 Llama2-13B、Stable Diffusion 模型上验证了动形状的训练、推理正确性和提速性能。 +- 移除过时的测试用例。 [#66269](https://github.com/PaddlePaddle/Paddle/pull/66269), [#66690](https://github.com/PaddlePaddle/Paddle/pull/66690), [#67505](https://github.com/PaddlePaddle/Paddle/pull/67505), [#67464](https://github.com/PaddlePaddle/Paddle/pull/67464), [#68400](https://github.com/PaddlePaddle/Paddle/pull/68400), [#68178](https://github.com/PaddlePaddle/Paddle/pull/68178), [#68194](https://github.com/PaddlePaddle/Paddle/pull/68194) +- 清理废弃的 flag 和配置。 [#69124](https://github.com/PaddlePaddle/Paddle/pull/69124), [#69176](https://github.com/PaddlePaddle/Paddle/pull/69176), [#69274](https://github.com/PaddlePaddle/Paddle/pull/69274), [#68384](https://github.com/PaddlePaddle/Paddle/pull/68384) +- 淘汰旧 API。 [#66032](https://github.com/PaddlePaddle/Paddle/pull/66032), [#67303](https://github.com/PaddlePaddle/Paddle/pull/67303) +- 对 PIR 冗余策略及单测进行清理。 [#66366](https://github.com/PaddlePaddle/Paddle/pull/66366), [#70534](https://github.com/PaddlePaddle/Paddle/pull/70534), [#68444](https://github.com/PaddlePaddle/Paddle/pull/68444), [#70599](https://github.com/PaddlePaddle/Paddle/pull/70599), [#68801](https://github.com/PaddlePaddle/Paddle/pull/68801), [#66303](https://github.com/PaddlePaddle/Paddle/pull/66303), [#67854](https://github.com/PaddlePaddle/Paddle/pull/67854), [#70795](https://github.com/PaddlePaddle/Paddle/pull/70795) +- 废弃动转静相关单测、api 等。 [#66421](https://github.com/PaddlePaddle/Paddle/pull/66421), [#68251](https://github.com/PaddlePaddle/Paddle/pull/68251), [#68252](https://github.com/PaddlePaddle/Paddle/pull/68252), [#68253](https://github.com/PaddlePaddle/Paddle/pull/68253), [#68254](https://github.com/PaddlePaddle/Paddle/pull/68254), [#68409](https://github.com/PaddlePaddle/Paddle/pull/68409), [#70569](https://github.com/PaddlePaddle/Paddle/pull/70569), [#71279](https://github.com/PaddlePaddle/Paddle/pull/71279) +- 废弃自动并行相关单测。 [#67857](https://github.com/PaddlePaddle/Paddle/pull/67857), [#67862](https://github.com/PaddlePaddle/Paddle/pull/67862), [#67995](https://github.com/PaddlePaddle/Paddle/pull/67995), [#68012](https://github.com/PaddlePaddle/Paddle/pull/68012), [#68013](https://github.com/PaddlePaddle/Paddle/pull/68013), [#67798](https://github.com/PaddlePaddle/Paddle/pull/67798) + +## 3. 编译器架构 + +CINN 编译器在完备性、性能表现等方面效果全面提升。此版本中,我们对编译器前端、后端各个环节进行了全面优化:包括新增反向计算图自动 Re-Compute 机制、前端 Pass 性能优化、符号推导机制升级、算子融合策略优化、后端 Schedule 策略和下标表达式化简能力增强等,同时排查并修复了大量正确性和性能问题,系统化的提升了编译器的通用优化能力。在飞桨 PaddleX 系列模型开启 CINN 编译器后相比动态图模式有超 60% 模型有显著性能提升。 ### 新功能 -1. 升级了全新的子图自动融合机制,创新性提出了 TrivialOp 和 ReduceOp 融合理论,支持更广泛的垂直融合和水平融合范围,保障了子图融合的正确性和鲁棒性,充分发挥神经网络编译器的融合潜力([#63340](https://github.com/PaddlePaddle/Paddle/pull/63340)、[#63913](https://github.com/PaddlePaddle/Paddle/pull/63913)、[#63579](https://github.com/PaddlePaddle/Paddle/pull/63579)、[#63605](https://github.com/PaddlePaddle/Paddle/pull/63605)、[#60769](https://github.com/PaddlePaddle/Paddle/pull/60769)、[#62088](https://github.com/PaddlePaddle/Paddle/pull/62088)、[#63124](https://github.com/PaddlePaddle/Paddle/pull/63124)、[#63658](https://github.com/PaddlePaddle/Paddle/pull/63658)、[#64557](https://github.com/PaddlePaddle/Paddle/pull/64557)、[#63318](https://github.com/PaddlePaddle/Paddle/pull/63318)、[#62545](https://github.com/PaddlePaddle/Paddle/pull/62545)) -2. 新增支持了动态形状的符号推导功能,基于 Shape Dialect 实现了动态符号构建、自动推导、约束表达、符号化简等机制,引入 DimExpr 概念,升级支持了飞桨框架 150+个典型基础算子的 InferSymbolicShape 逻辑,为编译器支持动态形状下的训练和推理提供更多信息([#60843](https://github.com/PaddlePaddle/Paddle/pull/60843)、[#62662](https://github.com/PaddlePaddle/Paddle/pull/62662)、[#63790](https://github.com/PaddlePaddle/Paddle/pull/63790)、[#60098](https://github.com/PaddlePaddle/Paddle/pull/60098)、[#60511](https://github.com/PaddlePaddle/Paddle/pull/60511)、[#61232](https://github.com/PaddlePaddle/Paddle/pull/61232)、[#61939](https://github.com/PaddlePaddle/Paddle/pull/61939)、[#62798](https://github.com/PaddlePaddle/Paddle/pull/62798)、[#62955](https://github.com/PaddlePaddle/Paddle/pull/62955)、[#63029](https://github.com/PaddlePaddle/Paddle/pull/63029)、[#60572](https://github.com/PaddlePaddle/Paddle/pull/60572)、[#61035](https://github.com/PaddlePaddle/Paddle/pull/61035)、[#61224](https://github.com/PaddlePaddle/Paddle/pull/61224)、[#61587](https://github.com/PaddlePaddle/Paddle/pull/61587)、[#61937](https://github.com/PaddlePaddle/Paddle/pull/61937)、[#62314](https://github.com/PaddlePaddle/Paddle/pull/62314)、[#62394](https://github.com/PaddlePaddle/Paddle/pull/62394)、[#62569](https://github.com/PaddlePaddle/Paddle/pull/62569)、[#62495](https://github.com/PaddlePaddle/Paddle/pull/62495)、[#62844](https://github.com/PaddlePaddle/Paddle/pull/62844)、[#63000](https://github.com/PaddlePaddle/Paddle/pull/63000)、[#63016](https://github.com/PaddlePaddle/Paddle/pull/63016)、[#64222](https://github.com/PaddlePaddle/Paddle/pull/64222)、[#60129](https://github.com/PaddlePaddle/Paddle/pull/60129)、[#60899](https://github.com/PaddlePaddle/Paddle/pull/60899)、[#61342](https://github.com/PaddlePaddle/Paddle/pull/61342)、[#61439](https://github.com/PaddlePaddle/Paddle/pull/61439)、[#62766](https://github.com/PaddlePaddle/Paddle/pull/62766)、[#61133](https://github.com/PaddlePaddle/Paddle/pull/61133)、[#61430](https://github.com/PaddlePaddle/Paddle/pull/61430)、[#61498](https://github.com/PaddlePaddle/Paddle/pull/61498)、[#61680](https://github.com/PaddlePaddle/Paddle/pull/61680)、[#63367](https://github.com/PaddlePaddle/Paddle/pull/63367)、[#62151](https://github.com/PaddlePaddle/Paddle/pull/62151)、[#62665](https://github.com/PaddlePaddle/Paddle/pull/62665)、[#61407](https://github.com/PaddlePaddle/Paddle/pull/61407)、[#61502](https://github.com/PaddlePaddle/Paddle/pull/61502)、[#61655](https://github.com/PaddlePaddle/Paddle/pull/61655)、[#64115](https://github.com/PaddlePaddle/Paddle/pull/64115)、[#61791](https://github.com/PaddlePaddle/Paddle/pull/61791)、[#62141](https://github.com/PaddlePaddle/Paddle/pull/62141)、[#63422](https://github.com/PaddlePaddle/Paddle/pull/63422)、[#63577](https://github.com/PaddlePaddle/Paddle/pull/63577)、[#63978](https://github.com/PaddlePaddle/Paddle/pull/63978)、[#63576](https://github.com/PaddlePaddle/Paddle/pull/63576)、[#63947](https://github.com/PaddlePaddle/Paddle/pull/63947)、[#64332](https://github.com/PaddlePaddle/Paddle/pull/64332)、[#63990](https://github.com/PaddlePaddle/Paddle/pull/63990)) -3. 新增了 Pass Pipline 功能,包括 PdToCinn、CinnPreprocess、BuildGroupOp、DivideGroupOp、CinnLowering、精度检查等 Pass 策略,统一支持动、静形状下子图的 Lowering 和执行,架构清晰([#61611](https://github.com/PaddlePaddle/Paddle/pull/61611)、[#62612](https://github.com/PaddlePaddle/Paddle/pull/62612)、[#64354](https://github.com/PaddlePaddle/Paddle/pull/64354)、[#61848](https://github.com/PaddlePaddle/Paddle/pull/61848)、[#62316](https://github.com/PaddlePaddle/Paddle/pull/62316)、[#64152](https://github.com/PaddlePaddle/Paddle/pull/64152)、[#61619](https://github.com/PaddlePaddle/Paddle/pull/61619)、[#62318](https://github.com/PaddlePaddle/Paddle/pull/62318)、[#61977](https://github.com/PaddlePaddle/Paddle/pull/61977)、[#62211](https://github.com/PaddlePaddle/Paddle/pull/62211)、[#63972](https://github.com/PaddlePaddle/Paddle/pull/63972)、[#63686](https://github.com/PaddlePaddle/Paddle/pull/63686)、[#64505](https://github.com/PaddlePaddle/Paddle/pull/64505)) -4. 新增支持了 BuketLower 和 DyShapeSchdule 功能,根据动态形状的范围实现自动分桶编译优化;并适配升级了 CodeGen 模块逻辑,支持 InferShape 函数生成和 Host 函数的条件分支分发功能,支撑大模型的动态 Shape 下训练推理加速([#62730](https://github.com/PaddlePaddle/Paddle/pull/62730)、[#61115](https://github.com/PaddlePaddle/Paddle/pull/61115)、[#59941](https://github.com/PaddlePaddle/Paddle/pull/59941)、[#62207](https://github.com/PaddlePaddle/Paddle/pull/62207)、[#64318](https://github.com/PaddlePaddle/Paddle/pull/64318)、[#64345](https://github.com/PaddlePaddle/Paddle/pull/64345)、[#60519](https://github.com/PaddlePaddle/Paddle/pull/60519)、[#62584](https://github.com/PaddlePaddle/Paddle/pull/62584)、[#60828](https://github.com/PaddlePaddle/Paddle/pull/60828)、[#60533](https://github.com/PaddlePaddle/Paddle/pull/60533)、[#61436](https://github.com/PaddlePaddle/Paddle/pull/61436)、[#62071](https://github.com/PaddlePaddle/Paddle/pull/62071)、[#63971](https://github.com/PaddlePaddle/Paddle/pull/63971)、[#61656](https://github.com/PaddlePaddle/Paddle/pull/61656)、[#63083](https://github.com/PaddlePaddle/Paddle/pull/63083)、[#64405](https://github.com/PaddlePaddle/Paddle/pull/64405)、[#63047](https://github.com/PaddlePaddle/Paddle/pull/63047)、[#64655](https://github.com/PaddlePaddle/Paddle/pull/64655)、[#63095](https://github.com/PaddlePaddle/Paddle/pull/63095)、[#63829](https://github.com/PaddlePaddle/Paddle/pull/63829)、[#63572](https://github.com/PaddlePaddle/Paddle/pull/63572)) -5. 新增支持了编译缓存策略,自动识别、合并和复用相同子图结构的编译结果,使用多线程提升编译效率,提升用户的使用体验([#62952](https://github.com/PaddlePaddle/Paddle/pull/62952)、[#63269](https://github.com/PaddlePaddle/Paddle/pull/63269)、[#64718](https://github.com/PaddlePaddle/Paddle/pull/64718)、[#61367](https://github.com/PaddlePaddle/Paddle/pull/61367)、[#63305](https://github.com/PaddlePaddle/Paddle/pull/63305)、[#63750](https://github.com/PaddlePaddle/Paddle/pull/63750)、[#63871](https://github.com/PaddlePaddle/Paddle/pull/63871)、[#64893](https://github.com/PaddlePaddle/Paddle/pull/64893)) -6. 新增支持了 GenerateShape 机制,添加了对应的 AST Compute 算子定义,支持动态符号的自动解析,以及在 Lowering 阶段自动生成 ShapeOp([#64167](https://github.com/PaddlePaddle/Paddle/pull/64167)、[#64636](https://github.com/PaddlePaddle/Paddle/pull/64636)、[#61993](https://github.com/PaddlePaddle/Paddle/pull/61993)、[#64843](https://github.com/PaddlePaddle/Paddle/pull/64843)、[#62587](https://github.com/PaddlePaddle/Paddle/pull/62587)) + +1. 新硬件后端支持:新增 HIP 和 SYCL 两种后端的支持。([#65146](https://github.com/PaddlePaddle/Paddle/pull/65146)、[#65329](https://github.com/PaddlePaddle/Paddle/pull/65329)、[#69554](https://github.com/PaddlePaddle/Paddle/pull/69554)、[#71204](https://github.com/PaddlePaddle/Paddle/pull/71204)、[#65438](https://github.com/PaddlePaddle/Paddle/pull/65438)、[#66476](https://github.com/PaddlePaddle/Paddle/pull/66476)、[#66620](https://github.com/PaddlePaddle/Paddle/pull/66620)、[#67813](https://github.com/PaddlePaddle/Paddle/pull/67813)) +2. 新增支持了推理场景下符号维度的数值范围、相等约束等信息的手工设置。([#67628](https://github.com/PaddlePaddle/Paddle/pull/67628)、[#67384](https://github.com/PaddlePaddle/Paddle/pull/67384)) ### 功能优化 -1. 优化了 BuildCinnPass 逻辑,升级编译器对黑白名单算子的感知策略,提升了 Pass 逻辑的鲁棒性([#62372](https://github.com/PaddlePaddle/Paddle/pull/62372)、[#61081](https://github.com/PaddlePaddle/Paddle/pull/61081)、[#61225](https://github.com/PaddlePaddle/Paddle/pull/61225)、[#58863](https://github.com/PaddlePaddle/Paddle/pull/58863)) -2. 优化了 OpLoweringGroup 数据结构,移除了不必要的接口和成员,降低上下游模块的耦合度([#62339](https://github.com/PaddlePaddle/Paddle/pull/62339)) -3. 优化了编译器关于架构 Arch 的组件设计,抽象硬件概念,降低国产硬件的适配成本([#63530](https://github.com/PaddlePaddle/Paddle/pull/63530)、[#64347](https://github.com/PaddlePaddle/Paddle/pull/64347)、[#64506](https://github.com/PaddlePaddle/Paddle/pull/64506)、[#64587](https://github.com/PaddlePaddle/Paddle/pull/64587)) -4. 升级了编译器后端算子 AST Compute 模块,适配支持了动态 Shape 的计算逻辑([#62488](https://github.com/PaddlePaddle/Paddle/pull/62488)、[#63581](https://github.com/PaddlePaddle/Paddle/pull/63581)、[#63687](https://github.com/PaddlePaddle/Paddle/pull/63687)、[#63654](https://github.com/PaddlePaddle/Paddle/pull/63654)、[#64217](https://github.com/PaddlePaddle/Paddle/pull/64217)) + +1. 优化报错信息打印,提升开发调试体验。([#67738](https://github.com/PaddlePaddle/Paddle/pull/67738)、[#68769](https://github.com/PaddlePaddle/Paddle/pull/68769)、[#71076](https://github.com/PaddlePaddle/Paddle/pull/71076)) +2. 支持 welford 算法,可以同时保证 BatchNorm 相关算子 Kenrel 的性能和精度。([#71184](https://github.com/PaddlePaddle/Paddle/pull/71184)、[#71057](https://github.com/PaddlePaddle/Paddle/pull/71057)) ### 性能优化 -1. 优化了 AST IR 的 Schedule 逻辑,重构了 Vectorize、Unroll、AxisBind、ComputeAt 等核心模块,合并动静形状迭代路径,降低开发维护成本([#60449](https://github.com/PaddlePaddle/Paddle/pull/60449)、[#60155](https://github.com/PaddlePaddle/Paddle/pull/60155)、[#60342](https://github.com/PaddlePaddle/Paddle/pull/60342)、[#60498](https://github.com/PaddlePaddle/Paddle/pull/60498)、[#60538](https://github.com/PaddlePaddle/Paddle/pull/60538)、[#60190](https://github.com/PaddlePaddle/Paddle/pull/60190)、[#61197](https://github.com/PaddlePaddle/Paddle/pull/61197)、[#63140](https://github.com/PaddlePaddle/Paddle/pull/63140)、[#61156](https://github.com/PaddlePaddle/Paddle/pull/61156)) -2. 优化了 Tiling 策略和 temp Buffer 功能,支持 warp-level 内存连续 Read 和 cache_read cache_write 功能, 提升子图执行性能([#64240](https://github.com/PaddlePaddle/Paddle/pull/64240)、[#60562](https://github.com/PaddlePaddle/Paddle/pull/60562)、[#64711](https://github.com/PaddlePaddle/Paddle/pull/64711)、[#62856](https://github.com/PaddlePaddle/Paddle/pull/62856)、[#61576](https://github.com/PaddlePaddle/Paddle/pull/61576)、[#61901](https://github.com/PaddlePaddle/Paddle/pull/61901)、[#62581](https://github.com/PaddlePaddle/Paddle/pull/62581)、[#61987](https://github.com/PaddlePaddle/Paddle/pull/61987)、[#60190](https://github.com/PaddlePaddle/Paddle/pull/60190)、[#63138](https://github.com/PaddlePaddle/Paddle/pull/63138)、[#62517](https://github.com/PaddlePaddle/Paddle/pull/62517)) -3. 支持 Schedule 配置的自动搜索功能,AOT 式离线保存机制实现子图 Kernel 的性能加速([#64271](https://github.com/PaddlePaddle/Paddle/pull/64271)、[#64588](https://github.com/PaddlePaddle/Paddle/pull/64588)、[#64694](https://github.com/PaddlePaddle/Paddle/pull/64694)、[#64620](https://github.com/PaddlePaddle/Paddle/pull/64620)、[#64702](https://github.com/PaddlePaddle/Paddle/pull/64702)、[#63086](https://github.com/PaddlePaddle/Paddle/pull/63086)) -4. 支持了 OptimizeReductionTactic 优化策略,提升 Reduce 场景下的 kernel 性能([#6066](https://github.com/PaddlePaddle/Paddle/pull/60661)、[#61363](https://github.com/PaddlePaddle/Paddle/pull/61363)、[#60881](https://github.com/PaddlePaddle/Paddle/pull/60881)、[#63859](https://github.com/PaddlePaddle/Paddle/pull/63859)) -5. 增强了 DCE Pass 功能,移除了多余的 If/For 分支代码,提升执行效率([#61682](https://github.com/PaddlePaddle/Paddle/pull/61682)) -6. 新增支持了 FuseParallelMatmulPass Pass,可融合多个 Matmul 算子实现加速([#63623](https://github.com/PaddlePaddle/Paddle/pull/63623)) + +1. 新增了 GridReduce、Loop 合并、Transpose 调优、自动向量化等后端优化策略,显著提升了各种维度空间、不同硬件配置全场景下的 Kernel 性能。([#67236](https://github.com/PaddlePaddle/Paddle/pull/67236)、[#68897](https://github.com/PaddlePaddle/Paddle/pull/68897)、[#69409](https://github.com/PaddlePaddle/Paddle/pull/69409)、[#65336](https://github.com/PaddlePaddle/Paddle/pull/65336)、[#66419](https://github.com/PaddlePaddle/Paddle/pull/66419)、[#68338](https://github.com/PaddlePaddle/Paddle/pull/68338)、[#68364](https://github.com/PaddlePaddle/Paddle/pull/68364)、[#71087](https://github.com/PaddlePaddle/Paddle/pull/71087)、[#68019](https://github.com/PaddlePaddle/Paddle/pull/68019)、[#68122](https://github.com/PaddlePaddle/Paddle/pull/68122)、[#65187](https://github.com/PaddlePaddle/Paddle/pull/65187)、[#66742](https://github.com/PaddlePaddle/Paddle/pull/66742)、[#67083](https://github.com/PaddlePaddle/Paddle/pull/67083)、[#68667](https://github.com/PaddlePaddle/Paddle/pull/68667)、[#68750](https://github.com/PaddlePaddle/Paddle/pull/68750)、[#69376](https://github.com/PaddlePaddle/Paddle/pull/69376)、[#69350](https://github.com/PaddlePaddle/Paddle/pull/69350)、[#69740](https://github.com/PaddlePaddle/Paddle/pull/69740)、[#68918](https://github.com/PaddlePaddle/Paddle/pull/68918)、[#70092](https://github.com/PaddlePaddle/Paddle/pull/70092)、[#69607](https://github.com/PaddlePaddle/Paddle/pull/69607)、[#69794](https://github.com/PaddlePaddle/Paddle/pull/69794)、[#70258](https://github.com/PaddlePaddle/Paddle/pull/70258)、[#70547](https://github.com/PaddlePaddle/Paddle/pull/70547)、[#70581](https://github.com/PaddlePaddle/Paddle/pull/70581)、[#70649](https://github.com/PaddlePaddle/Paddle/pull/70649)、[#69732](https://github.com/PaddlePaddle/Paddle/pull/69732)、[#70786](https://github.com/PaddlePaddle/Paddle/pull/70786)、[#70942](https://github.com/PaddlePaddle/Paddle/pull/70942)、[#71014](https://github.com/PaddlePaddle/Paddle/pull/71014)、[#71263](https://github.com/PaddlePaddle/Paddle/pull/71263)、[#71249](https://github.com/PaddlePaddle/Paddle/pull/71249)、[#71340](https://github.com/PaddlePaddle/Paddle/pull/71340)、[#71301](https://github.com/PaddlePaddle/Paddle/pull/71301)、[#71380](https://github.com/PaddlePaddle/Paddle/pull/71380)) +2. 优化算子融合策略,升级了包括水平融合、多下游融合、Reshape 对齐融合等多种策略,进一步增强算子的融合能力,提升端到端优化性能。([#66034](https://github.com/PaddlePaddle/Paddle/pull/66034)、[#67829](https://github.com/PaddlePaddle/Paddle/pull/67829)、[#68171](https://github.com/PaddlePaddle/Paddle/pull/68171)、[#69478](https://github.com/PaddlePaddle/Paddle/pull/69478)、[#69691](https://github.com/PaddlePaddle/Paddle/pull/69691)、[#70665](https://github.com/PaddlePaddle/Paddle/pull/70665)、[#71103](https://github.com/PaddlePaddle/Paddle/pull/71103)、[#70873](https://github.com/PaddlePaddle/Paddle/pull/70873)) +3. 升级了后端下标表达式的化简能力,支持动静态维度的复杂表达式化简,显著降低后端生成 Kernel 的下标计算开销。([#68011](https://github.com/PaddlePaddle/Paddle/pull/68011)、[#68617](https://github.com/PaddlePaddle/Paddle/pull/68617)、[#68624](https://github.com/PaddlePaddle/Paddle/pull/68624)、[#68685](https://github.com/PaddlePaddle/Paddle/pull/68685)、[#68220](https://github.com/PaddlePaddle/Paddle/pull/68220)、[#68720](https://github.com/PaddlePaddle/Paddle/pull/68720)、[#68753](https://github.com/PaddlePaddle/Paddle/pull/68753)、[#68986](https://github.com/PaddlePaddle/Paddle/pull/68986)、[#68987](https://github.com/PaddlePaddle/Paddle/pull/68987)、[#69071](https://github.com/PaddlePaddle/Paddle/pull/69071)、[#69164](https://github.com/PaddlePaddle/Paddle/pull/69164)、 [#69282](https://github.com/PaddlePaddle/Paddle/pull/69282)、[#69522](https://github.com/PaddlePaddle/Paddle/pull/69522)、[#69857](https://github.com/PaddlePaddle/Paddle/pull/69857)、[#70208](https://github.com/PaddlePaddle/Paddle/pull/70208)、[#70355](https://github.com/PaddlePaddle/Paddle/pull/70355)、[#70427](https://github.com/PaddlePaddle/Paddle/pull/70208)、[#70450](https://github.com/PaddlePaddle/Paddle/pull/70450)、[#68737](https://github.com/PaddlePaddle/Paddle/pull/68737)、[#70500](https://github.com/PaddlePaddle/Paddle/pull/70500)、[#70953](https://github.com/PaddlePaddle/Paddle/pull/70953)、[#70933](https://github.com/PaddlePaddle/Paddle/pull/70933)、[#71026](https://github.com/PaddlePaddle/Paddle/pull/71026)、[#70456](https://github.com/PaddlePaddle/Paddle/pull/70456)、[#70257](https://github.com/PaddlePaddle/Paddle/pull/70257)、[#70461](https://github.com/PaddlePaddle/Paddle/pull/70461)、[#70142](https://github.com/PaddlePaddle/Paddle/pull/70142)、[#71018](https://github.com/PaddlePaddle/Paddle/pull/71018)、[#71278](https://github.com/PaddlePaddle/Paddle/pull/71278)) +4. 新增了反向计算图自动 Re-Compute 机制,可有效降低模型训练显存并提升性能。([#69342](https://github.com/PaddlePaddle/Paddle/pull/69342)、[#70255](https://github.com/PaddlePaddle/Paddle/pull/70255)、[#68241](https://github.com/PaddlePaddle/Paddle/pull/68241)、[#69954](https://github.com/PaddlePaddle/Paddle/pull/69954)、[#70832](https://github.com/PaddlePaddle/Paddle/pull/70832)) +5. 优化后端 Host、Device 代码编译流程,降低编译耗时,同时提升 Broadcast 场景下分支的处理性能。([#65669](https://github.com/PaddlePaddle/Paddle/pull/65669)、[#65916](https://github.com/PaddlePaddle/Paddle/pull/65916)、[#66109](https://github.com/PaddlePaddle/Paddle/pull/66109)、[#65611](https://github.com/PaddlePaddle/Paddle/pull/65611)、[#65990](https://github.com/PaddlePaddle/Paddle/pull/65990)、[#66088](https://github.com/PaddlePaddle/Paddle/pull/66088)、[#66207](https://github.com/PaddlePaddle/Paddle/pull/66207)、[#66537](https://github.com/PaddlePaddle/Paddle/pull/66537)、[#66768](https://github.com/PaddlePaddle/Paddle/pull/66768)、[#70685](https://github.com/PaddlePaddle/Paddle/pull/70685)、[#71410](https://github.com/PaddlePaddle/Paddle/pull/71410)、[#66062](https://github.com/PaddlePaddle/Paddle/pull/66062)) +6. 完善升级了动态维度的符号推导、化简、缓存等机制,添加了所有常规算子(580+)的符号推导接口实现,为 Kernel 编译提供更多约束信息。([#65343](https://github.com/PaddlePaddle/Paddle/pull/65343)、[#66582](https://github.com/PaddlePaddle/Paddle/pull/66582)、[#65500](https://github.com/PaddlePaddle/Paddle/pull/65500)、[#65591](https://github.com/PaddlePaddle/Paddle/pull/65591)、[#66637](https://github.com/PaddlePaddle/Paddle/pull/66637)、[#68208](https://github.com/PaddlePaddle/Paddle/pull/68208)、[#68056](https://github.com/PaddlePaddle/Paddle/pull/68056)、[#68015](https://github.com/PaddlePaddle/Paddle/pull/68015)、[#68096](https://github.com/PaddlePaddle/Paddle/pull/68096)、[#68236](https://github.com/PaddlePaddle/Paddle/pull/68236)、[#68973](https://github.com/PaddlePaddle/Paddle/pull/68973)、[#68967](https://github.com/PaddlePaddle/Paddle/pull/68967)、[#69133](https://github.com/PaddlePaddle/Paddle/pull/69133)、[#68550](https://github.com/PaddlePaddle/Paddle/pull/68550)、[#68882](https://github.com/PaddlePaddle/Paddle/pull/68882)、[#69005](https://github.com/PaddlePaddle/Paddle/pull/69005)、[#69911](https://github.com/PaddlePaddle/Paddle/pull/69911)、[#70376](https://github.com/PaddlePaddle/Paddle/pull/70376)、[#71153](https://github.com/PaddlePaddle/Paddle/pull/71153)、[#66644](https://github.com/PaddlePaddle/Paddle/pull/66644)、[#66650](https://github.com/PaddlePaddle/Paddle/pull/66650)、[#66642](https://github.com/PaddlePaddle/Paddle/pull/66642)、[#66729](https://github.com/PaddlePaddle/Paddle/pull/66729)、[#66838](https://github.com/PaddlePaddle/Paddle/pull/66838)、[#66762](https://github.com/PaddlePaddle/Paddle/pull/66762)、[#66580](https://github.com/PaddlePaddle/Paddle/pull/66580)、[#66612](https://github.com/PaddlePaddle/Paddle/pull/66612)、[#66625](https://github.com/PaddlePaddle/Paddle/pull/66625)、[#66643](https://github.com/PaddlePaddle/Paddle/pull/66643)、[#66837](https://github.com/PaddlePaddle/Paddle/pull/66837)、[#66946](https://github.com/PaddlePaddle/Paddle/pull/66946)、[#67018](https://github.com/PaddlePaddle/Paddle/pull/67018)、[#67049](https://github.com/PaddlePaddle/Paddle/pull/67049)、[#66956](https://github.com/PaddlePaddle/Paddle/pull/66956)、[#67008](https://github.com/PaddlePaddle/Paddle/pull/67008)、[#66930](https://github.com/PaddlePaddle/Paddle/pull/66930)、[#66877](https://github.com/PaddlePaddle/Paddle/pull/66877)、[#66896](https://github.com/PaddlePaddle/Paddle/pull/66896)、[#67120](https://github.com/PaddlePaddle/Paddle/pull/67120)、[#67117](https://github.com/PaddlePaddle/Paddle/pull/67117)、[#67098](https://github.com/PaddlePaddle/Paddle/pull/67098)、[#67136](https://github.com/PaddlePaddle/Paddle/pull/67136)、[#67294](https://github.com/PaddlePaddle/Paddle/pull/67294)、[#67327](https://github.com/PaddlePaddle/Paddle/pull/67327)、[#66827](https://github.com/PaddlePaddle/Paddle/pull/66827)、[#67201](https://github.com/PaddlePaddle/Paddle/pull/67201)、[#66892](https://github.com/PaddlePaddle/Paddle/pull/66892)、[#67377](https://github.com/PaddlePaddle/Paddle/pull/67377)、[#66619](https://github.com/PaddlePaddle/Paddle/pull/66619)、[#67037](https://github.com/PaddlePaddle/Paddle/pull/67037)、[#67412](https://github.com/PaddlePaddle/Paddle/pull/67412)、[#67394](https://github.com/PaddlePaddle/Paddle/pull/67394)、[#67374](https://github.com/PaddlePaddle/Paddle/pull/67374)、[#67418](https://github.com/PaddlePaddle/Paddle/pull/67418)、[#67348](https://github.com/PaddlePaddle/Paddle/pull/67348)、[#67337](https://github.com/PaddlePaddle/Paddle/pull/67337)、[#67390](https://github.com/PaddlePaddle/Paddle/pull/67390)、[#67407](https://github.com/PaddlePaddle/Paddle/pull/67407)、[#67491](https://github.com/PaddlePaddle/Paddle/pull/67491)、[#67422](https://github.com/PaddlePaddle/Paddle/pull/67422)、[#67461](https://github.com/PaddlePaddle/Paddle/pull/67461)、[#67458](https://github.com/PaddlePaddle/Paddle/pull/67458)、[#67486](https://github.com/PaddlePaddle/Paddle/pull/67486)、[#67490](https://github.com/PaddlePaddle/Paddle/pull/67490)、[#67462](https://github.com/PaddlePaddle/Paddle/pull/67462)、[#67364](https://github.com/PaddlePaddle/Paddle/pull/67364)、[#67435](https://github.com/PaddlePaddle/Paddle/pull/67435)、[#67665](https://github.com/PaddlePaddle/Paddle/pull/67665)、[#67426](https://github.com/PaddlePaddle/Paddle/pull/67426)、[#67507](https://github.com/PaddlePaddle/Paddle/pull/67507)、[#67730](https://github.com/PaddlePaddle/Paddle/pull/67730)、[#67776](https://github.com/PaddlePaddle/Paddle/pull/67776)、[#67806](https://github.com/PaddlePaddle/Paddle/pull/67806)、[#67803](https://github.com/PaddlePaddle/Paddle/pull/67803)、[#67788](https://github.com/PaddlePaddle/Paddle/pull/67788)、[#67705](https://github.com/PaddlePaddle/Paddle/pull/67705)、[#67814](https://github.com/PaddlePaddle/Paddle/pull/67814)、[#67858](https://github.com/PaddlePaddle/Paddle/pull/67858)、[#67751](https://github.com/PaddlePaddle/Paddle/pull/67751)、[#67875](https://github.com/PaddlePaddle/Paddle/pull/67875)、[#67663](https://github.com/PaddlePaddle/Paddle/pull/67663)、[#67434](https://github.com/PaddlePaddle/Paddle/pull/67434)、[#67818](https://github.com/PaddlePaddle/Paddle/pull/67818)、[#68180](https://github.com/PaddlePaddle/Paddle/pull/68180)、[#68547](https://github.com/PaddlePaddle/Paddle/pull/68547)、[#68548](https://github.com/PaddlePaddle/Paddle/pull/68548)、[#68670](https://github.com/PaddlePaddle/Paddle/pull/68670)、[#68964](https://github.com/PaddlePaddle/Paddle/pull/68964)、[#68929](https://github.com/PaddlePaddle/Paddle/pull/68929)、[#68907](https://github.com/PaddlePaddle/Paddle/pull/68907)、[#68917](https://github.com/PaddlePaddle/Paddle/pull/68917)、[#68984](https://github.com/PaddlePaddle/Paddle/pull/68984)、[#68644](https://github.com/PaddlePaddle/Paddle/pull/68644)、[#69167](https://github.com/PaddlePaddle/Paddle/pull/69167)、[#68975](https://github.com/PaddlePaddle/Paddle/pull/68975)、[#68947](https://github.com/PaddlePaddle/Paddle/pull/68947)、[#68978](https://github.com/PaddlePaddle/Paddle/pull/68978)、[#68980](https://github.com/PaddlePaddle/Paddle/pull/68980)、[#68979](https://github.com/PaddlePaddle/Paddle/pull/68979)、[#69329](https://github.com/PaddlePaddle/Paddle/pull/69329)、[#69055](https://github.com/PaddlePaddle/Paddle/pull/69055)、[#69331](https://github.com/PaddlePaddle/Paddle/pull/69331)、[#69414](https://github.com/PaddlePaddle/Paddle/pull/69414)、[#69335](https://github.com/PaddlePaddle/Paddle/pull/69335)、[#69017](https://github.com/PaddlePaddle/Paddle/pull/69017)、[#69344](https://github.com/PaddlePaddle/Paddle/pull/69344)、[#69069](https://github.com/PaddlePaddle/Paddle/pull/69069)、[#69698](https://github.com/PaddlePaddle/Paddle/pull/69698)、[#69919](https://github.com/PaddlePaddle/Paddle/pull/69919)、[#69964](https://github.com/PaddlePaddle/Paddle/pull/69964)、[#70337](https://github.com/PaddlePaddle/Paddle/pull/70337)、[#70282](https://github.com/PaddlePaddle/Paddle/pull/70282)、[#70741](https://github.com/PaddlePaddle/Paddle/pull/70741)、[#70818](https://github.com/PaddlePaddle/Paddle/pull/70818)、[#71031](https://github.com/PaddlePaddle/Paddle/pull/71031)、[#70541](https://github.com/PaddlePaddle/Paddle/pull/70541)、[#66609](https://github.com/PaddlePaddle/Paddle/pull/66609)、[#66889](https://github.com/PaddlePaddle/Paddle/pull/66889)、[#66633](https://github.com/PaddlePaddle/Paddle/pull/66633)、[#66735](https://github.com/PaddlePaddle/Paddle/pull/66735)、[#66935](https://github.com/PaddlePaddle/Paddle/pull/66935)、[#66627](https://github.com/PaddlePaddle/Paddle/pull/66627)、[#66730](https://github.com/PaddlePaddle/Paddle/pull/66730)、[#67210](https://github.com/PaddlePaddle/Paddle/pull/67210)、[#67115](https://github.com/PaddlePaddle/Paddle/pull/67115)、[#67275](https://github.com/PaddlePaddle/Paddle/pull/67275)、[#67472](https://github.com/PaddlePaddle/Paddle/pull/67472)、[#67577](https://github.com/PaddlePaddle/Paddle/pull/67577)、[#67328](https://github.com/PaddlePaddle/Paddle/pull/67328)、[#67566](https://github.com/PaddlePaddle/Paddle/pull/67566)、[#67451](https://github.com/PaddlePaddle/Paddle/pull/67451)、[#68098](https://github.com/PaddlePaddle/Paddle/pull/68098)、[#68225](https://github.com/PaddlePaddle/Paddle/pull/68225)、[#68177](https://github.com/PaddlePaddle/Paddle/pull/68177)、[#68102](https://github.com/PaddlePaddle/Paddle/pull/68102)、[#67951](https://github.com/PaddlePaddle/Paddle/pull/67951)、[#67957](https://github.com/PaddlePaddle/Paddle/pull/67957)、[#68235](https://github.com/PaddlePaddle/Paddle/pull/68235)、[#68447](https://github.com/PaddlePaddle/Paddle/pull/68447)、[#68446](https://github.com/PaddlePaddle/Paddle/pull/68446)、[#68183](https://github.com/PaddlePaddle/Paddle/pull/68183)、[#68318](https://github.com/PaddlePaddle/Paddle/pull/68318)、[#68385](https://github.com/PaddlePaddle/Paddle/pull/68385)、[#67635](https://github.com/PaddlePaddle/Paddle/pull/67635)、[#65623](https://github.com/PaddlePaddle/Paddle/pull/65623)、[#65956](https://github.com/PaddlePaddle/Paddle/pull/65956)、[#66063](https://github.com/PaddlePaddle/Paddle/pull/66063)、[#65992](https://github.com/PaddlePaddle/Paddle/pull/65992)、[#65880](https://github.com/PaddlePaddle/Paddle/pull/65880)、[#66343](https://github.com/PaddlePaddle/Paddle/pull/66343)、[#65889](https://github.com/PaddlePaddle/Paddle/pull/65889)、[#66606](https://github.com/PaddlePaddle/Paddle/pull/66606)、[#66618](https://github.com/PaddlePaddle/Paddle/pull/66618)、[#66737](https://github.com/PaddlePaddle/Paddle/pull/66737)、[#66607](https://github.com/PaddlePaddle/Paddle/pull/66607)、[#66579](https://github.com/PaddlePaddle/Paddle/pull/66579)、[#66732](https://github.com/PaddlePaddle/Paddle/pull/66732)、[#66849](https://github.com/PaddlePaddle/Paddle/pull/66849)、[#66400](https://github.com/PaddlePaddle/Paddle/pull/66400)、[#66952](https://github.com/PaddlePaddle/Paddle/pull/66952)、[#66570](https://github.com/PaddlePaddle/Paddle/pull/66570)、[#66967](https://github.com/PaddlePaddle/Paddle/pull/66967)、[#66595](https://github.com/PaddlePaddle/Paddle/pull/66595)、[#67121](https://github.com/PaddlePaddle/Paddle/pull/67121)、[#67206](https://github.com/PaddlePaddle/Paddle/pull/67206)、[#67444](https://github.com/PaddlePaddle/Paddle/pull/67444)、[#67494](https://github.com/PaddlePaddle/Paddle/pull/67494)、[#67499](https://github.com/PaddlePaddle/Paddle/pull/67499)、[#67267](https://github.com/PaddlePaddle/Paddle/pull/67267)、[#67567](https://github.com/PaddlePaddle/Paddle/pull/67567)、[#67455](https://github.com/PaddlePaddle/Paddle/pull/67455)、[#67161](https://github.com/PaddlePaddle/Paddle/pull/67161)、[#67581](https://github.com/PaddlePaddle/Paddle/pull/67581)、[#67539](https://github.com/PaddlePaddle/Paddle/pull/67539)、[#67625](https://github.com/PaddlePaddle/Paddle/pull/67625)、[#67690](https://github.com/PaddlePaddle/Paddle/pull/67690)、[#67454](https://github.com/PaddlePaddle/Paddle/pull/67454)、[#67731](https://github.com/PaddlePaddle/Paddle/pull/67731)、[#67734](https://github.com/PaddlePaddle/Paddle/pull/67734)、[#67735](https://github.com/PaddlePaddle/Paddle/pull/67735)、[#67607](https://github.com/PaddlePaddle/Paddle/pull/67607)、[#67413](https://github.com/PaddlePaddle/Paddle/pull/67413)、[#67387](https://github.com/PaddlePaddle/Paddle/pull/67387)、[#67882](https://github.com/PaddlePaddle/Paddle/pull/67882)、[#67864](https://github.com/PaddlePaddle/Paddle/pull/67864)、[#67503](https://github.com/PaddlePaddle/Paddle/pull/67503)、[#67861](https://github.com/PaddlePaddle/Paddle/pull/67861)、[#67888](https://github.com/PaddlePaddle/Paddle/pull/67888)、[#67884](https://github.com/PaddlePaddle/Paddle/pull/67884)、[#67826](https://github.com/PaddlePaddle/Paddle/pull/67826)、[#68044](https://github.com/PaddlePaddle/Paddle/pull/68044)、[#67851](https://github.com/PaddlePaddle/Paddle/pull/67851)、[#68276](https://github.com/PaddlePaddle/Paddle/pull/68276)、[#69888](https://github.com/PaddlePaddle/Paddle/pull/69888)、[#70093](https://github.com/PaddlePaddle/Paddle/pull/70093)、[#70436](https://github.com/PaddlePaddle/Paddle/pull/70436)、[#70914](https://github.com/PaddlePaddle/Paddle/pull/70914)、[#71222](https://github.com/PaddlePaddle/Paddle/pull/71222)) +7. 优化了部分前端 Pass,提高前端处理流程的鲁棒性,提升计算密集型的子图性能。 ([#65142](https://github.com/PaddlePaddle/Paddle/pull/65142)、[#67466](https://github.com/PaddlePaddle/Paddle/pull/67466)、[#69228](https://github.com/PaddlePaddle/Paddle/pull/69228)、[#70994](https://github.com/PaddlePaddle/Paddle/pull/70994)、[#71226](https://github.com/PaddlePaddle/Paddle/pull/71226)、[#71297](https://github.com/PaddlePaddle/Paddle/pull/71297)、[#71443](https://github.com/PaddlePaddle/Paddle/pull/71443)) +8. 设计了新的后端 IR 基础组件和相关 Pass 接口,提供更加简洁高效的优化策略开发方式,通过自动剪枝策略同时可有效降低后端 IR 的遍历开销。([#70485](https://github.com/PaddlePaddle/Paddle/pull/70485)、[#70765](https://github.com/PaddlePaddle/Paddle/pull/70765)、[#71042](https://github.com/PaddlePaddle/Paddle/pull/71042)、[#70952](https://github.com/PaddlePaddle/Paddle/pull/70952)、[#69454](https://github.com/PaddlePaddle/Paddle/pull/69454)、[#70361](https://github.com/PaddlePaddle/Paddle/pull/70361)、[#70334](https://github.com/PaddlePaddle/Paddle/pull/70334)、[#70406](https://github.com/PaddlePaddle/Paddle/pull/70406)、 [#70191](https://github.com/PaddlePaddle/Paddle/pull/70191)、[#70462](https://github.com/PaddlePaddle/Paddle/pull/70462)、[#70548](https://github.com/PaddlePaddle/Paddle/pull/70548)、[#70592](https://github.com/PaddlePaddle/Paddle/pull/70592)、[#70437](https://github.com/PaddlePaddle/Paddle/pull/70437)、[#70619](https://github.com/PaddlePaddle/Paddle/pull/70619)、[#70543](https://github.com/PaddlePaddle/Paddle/pull/70543)、[#69611](https://github.com/PaddlePaddle/Paddle/pull/69611)、[#70739](https://github.com/PaddlePaddle/Paddle/pull/70739)、[#70533](https://github.com/PaddlePaddle/Paddle/pull/70533)、[#70696](https://github.com/PaddlePaddle/Paddle/pull/70696)、[#70498](https://github.com/PaddlePaddle/Paddle/pull/70498)、[#70829](https://github.com/PaddlePaddle/Paddle/pull/70829)、[#71111](https://github.com/PaddlePaddle/Paddle/pull/71111)、[#70883](https://github.com/PaddlePaddle/Paddle/pull/70883)) ### Bug 修复 -1. 修复了部分特殊算子在 Lowering 到编译器时 BUG,提升了端到端使用的用户体验([#60800](https://github.com/PaddlePaddle/Paddle/pull/60800)、[#64720](https://github.com/PaddlePaddle/Paddle/pull/64720)、[#62593](https://github.com/PaddlePaddle/Paddle/pull/62593)、[#62661](https://github.com/PaddlePaddle/Paddle/pull/62661)、[#64626](https://github.com/PaddlePaddle/Paddle/pull/64626)、[#63320](https://github.com/PaddlePaddle/Paddle/pull/63320)、[#64581](https://github.com/PaddlePaddle/Paddle/pull/64581)、[#61608](https://github.com/PaddlePaddle/Paddle/pull/61608)、[#64135](https://github.com/PaddlePaddle/Paddle/pull/64135)、[#64659](https://github.com/PaddlePaddle/Paddle/pull/64659)、[#62391](https://github.com/PaddlePaddle/Paddle/pull/62391)、[#62490](https://github.com/PaddlePaddle/Paddle/pull/62490)、[#63891](https://github.com/PaddlePaddle/Paddle/pull/63891)、[#64529](https://github.com/PaddlePaddle/Paddle/pull/64529)) -2. 修复了部分算子符号推导实现逻辑的 BUG([#62141](https://github.com/PaddlePaddle/Paddle/pull/62141)、[#62376](https://github.com/PaddlePaddle/Paddle/pull/62376)、[#62941](https://github.com/PaddlePaddle/Paddle/pull/62941)、[#63322](https://github.com/PaddlePaddle/Paddle/pull/63322)、[#64672](https://github.com/PaddlePaddle/Paddle/pull/64672)、[#64407](https://github.com/PaddlePaddle/Paddle/pull/64407)、[#60241](https://github.com/PaddlePaddle/Paddle/pull/60241)、[#60440](https://github.com/PaddlePaddle/Paddle/pull/60440)、[#62503](https://github.com/PaddlePaddle/Paddle/pull/62503)、[#62997](https://github.com/PaddlePaddle/Paddle/pull/62997)、[#63169](https://github.com/PaddlePaddle/Paddle/pull/63169)、[#61098](https://github.com/PaddlePaddle/Paddle/pull/61098)、[#63973](https://github.com/PaddlePaddle/Paddle/pull/63973)、[#62248](https://github.com/PaddlePaddle/Paddle/pull/62248)、[#62321](https://github.com/PaddlePaddle/Paddle/pull/62321)、[#63755](https://github.com/PaddlePaddle/Paddle/pull/63755)、[#63917](https://github.com/PaddlePaddle/Paddle/pull/63917)、[#63903](https://github.com/PaddlePaddle/Paddle/pull/63903)、[#64173](https://github.com/PaddlePaddle/Paddle/pull/64173)、[#64525](https://github.com/PaddlePaddle/Paddle/pull/64525)、[#64615](https://github.com/PaddlePaddle/Paddle/pull/64615)、[#62247](https://github.com/PaddlePaddle/Paddle/pull/62247)、[#62455](https://github.com/PaddlePaddle/Paddle/pull/62455)、[#62898](https://github.com/PaddlePaddle/Paddle/pull/62898)、[#62867](https://github.com/PaddlePaddle/Paddle/pull/62867)、[#63608](https://github.com/PaddlePaddle/Paddle/pull/63608)、[#63789](https://github.com/PaddlePaddle/Paddle/pull/63789)、[#64085](https://github.com/PaddlePaddle/Paddle/pull/64085)、[#64136](https://github.com/PaddlePaddle/Paddle/pull/64136)、[#64181](https://github.com/PaddlePaddle/Paddle/pull/64181)) -3. 修复了动静形状下编译器执行结果错误的诸多问题,提升了框架机制的鲁棒性([#60813](https://github.com/PaddlePaddle/Paddle/pull/60813)、[#61877](https://github.com/PaddlePaddle/Paddle/pull/61877)、[#61909](https://github.com/PaddlePaddle/Paddle/pull/61909)、[#62954](https://github.com/PaddlePaddle/Paddle/pull/62954)、[#63614](https://github.com/PaddlePaddle/Paddle/pull/63614)、[#60339](https://github.com/PaddlePaddle/Paddle/pull/60339)、[#60623](https://github.com/PaddlePaddle/Paddle/pull/60623)、[#60658](https://github.com/PaddlePaddle/Paddle/pull/60658)、[#60669](https://github.com/PaddlePaddle/Paddle/pull/60669)、[#58823](https://github.com/PaddlePaddle/Paddle/pull/58823)、[#62483](https://github.com/PaddlePaddle/Paddle/pull/62483)、[#62742](https://github.com/PaddlePaddle/Paddle/pull/62742)、[#61797](https://github.com/PaddlePaddle/Paddle/pull/61797)、[#63411](https://github.com/PaddlePaddle/Paddle/pull/63411)、[#64077](https://github.com/PaddlePaddle/Paddle/pull/64077)、[#62736](https://github.com/PaddlePaddle/Paddle/pull/62736)、[#62390](https://github.com/PaddlePaddle/Paddle/pull/62390)、[#63689](https://github.com/PaddlePaddle/Paddle/pull/63689)) -### 废弃功能 -1. 移除了 adt DimExpr、SymbolicDimExpr、ShapedTypeInterface 等无用的符号相关组件([#60901](https://github.com/PaddlePaddle/Paddle/pull/60901)、[#60933](https://github.com/PaddlePaddle/Paddle/pull/60933)、[#60744](https://github.com/PaddlePaddle/Paddle/pull/60744)、[#64176](https://github.com/PaddlePaddle/Paddle/pull/64176)、[#64140](https://github.com/PaddlePaddle/Paddle/pull/64140)) -2. 移除了旧的 Group Cluster、旧 IR 下的前端表示等相关组件,提升架构层面的简洁性([#63683](https://github.com/PaddlePaddle/Paddle/pull/63683)、[#64630](https://github.com/PaddlePaddle/Paddle/pull/64630)、[#61380](https://github.com/PaddlePaddle/Paddle/pull/61380)) +1. 修复部分算子符号推导实现逻辑的 Bug。([#65185](https://github.com/PaddlePaddle/Paddle/pull/65185)、[#65231](https://github.com/PaddlePaddle/Paddle/pull/65231)、[#65266](https://github.com/PaddlePaddle/Paddle/pull/65266)、[#65951](https://github.com/PaddlePaddle/Paddle/pull/65951)、[#67142](https://github.com/PaddlePaddle/Paddle/pull/67142)、[#67286](https://github.com/PaddlePaddle/Paddle/pull/67286)、[#65958](https://github.com/PaddlePaddle/Paddle/pull/65958)、[#65955](https://github.com/PaddlePaddle/Paddle/pull/65955)、[#66470](https://github.com/PaddlePaddle/Paddle/pull/66470)、[#66764](https://github.com/PaddlePaddle/Paddle/pull/66764)、[#66036](https://github.com/PaddlePaddle/Paddle/pull/66036)、[#66662](https://github.com/PaddlePaddle/Paddle/pull/66662)、[#66741](https://github.com/PaddlePaddle/Paddle/pull/66741)、[#66745](https://github.com/PaddlePaddle/Paddle/pull/66745)、[#66807](https://github.com/PaddlePaddle/Paddle/pull/66807)、[#66791](https://github.com/PaddlePaddle/Paddle/pull/66791)、[#66859](https://github.com/PaddlePaddle/Paddle/pull/66859)、[#66880](https://github.com/PaddlePaddle/Paddle/pull/66880)、[#66962](https://github.com/PaddlePaddle/Paddle/pull/66962)) +2. 修复部分特殊算子 Lowering 到编译器时的 Bug。([#68698](https://github.com/PaddlePaddle/Paddle/pull/68698)、[#68699](https://github.com/PaddlePaddle/Paddle/pull/68699)、 [#68691](https://github.com/PaddlePaddle/Paddle/pull/68691)、[#68948](https://github.com/PaddlePaddle/Paddle/pull/68948)、[#70144](https://github.com/PaddlePaddle/Paddle/pull/70144)、[#70895](https://github.com/PaddlePaddle/Paddle/pull/70895)) +3. 修复算子融合在部分场景报错的问题。([#67038](https://github.com/PaddlePaddle/Paddle/pull/67038)、[#67400](https://github.com/PaddlePaddle/Paddle/pull/67400)、[#67655](https://github.com/PaddlePaddle/Paddle/pull/67655)、[#67723](https://github.com/PaddlePaddle/Paddle/pull/67723)、[#68029](https://github.com/PaddlePaddle/Paddle/pull/68029)、[#68042](https://github.com/PaddlePaddle/Paddle/pull/68042)、[#68888](https://github.com/PaddlePaddle/Paddle/pull/68888)、[#69250](https://github.com/PaddlePaddle/Paddle/pull/69250)、[#69937](https://github.com/PaddlePaddle/Paddle/pull/69937)、[#70924](https://github.com/PaddlePaddle/Paddle/pull/70924)) +4. 修复后端在处理极端值时的正确性问题,提高编译器的鲁棒性。([#68327](https://github.com/PaddlePaddle/Paddle/pull/68327)) +5. 修复后端 Schedule 和 后处理调优过程的实现逻辑 Bug,解决部分 case 下的报错和性能问题。([#68605](https://github.com/PaddlePaddle/Paddle/pull/68605)、[#68937](https://github.com/PaddlePaddle/Paddle/pull/68937)、[#68587](https://github.com/PaddlePaddle/Paddle/pull/68587)、[#69060](https://github.com/PaddlePaddle/Paddle/pull/69060)、[#69608](https://github.com/PaddlePaddle/Paddle/pull/69608)、[#71471](https://github.com/PaddlePaddle/Paddle/pull/71471)、[#71068](https://github.com/PaddlePaddle/Paddle/pull/71068)) +6. 解决了算子融合过程中的存在随机性的问题。([#69547](https://github.com/PaddlePaddle/Paddle/pull/69547)、[#70931](https://github.com/PaddlePaddle/Paddle/pull/70931)) + +## 4. 自动并行架构 + +在 3.0 正式版中,我们对自动并行架构进行了深入的验证和打磨,以更好地支持纯文稠密模型、纯文稀疏模型(MoE)和多模态理解模型等常见大模型场景的预训练+精调流程。具体而言,我们针对这些场景新增了 20+算子的切分推导规则,并支持将自动并行训练参数转化成手动并行参数进行下游推理,使自动并行达到了全面可用的状态,帮助用户降低大模型并行程序的开发成本。同时,为了进一步简化用户的分布式开发流程,我们推出了一个新的`paddle.distributed.parallel`接口,基于对分布式张量标记语法的封装,支持用户在模型组网外不侵入地配置数据并行、模型并行、流水并行等常见的并行策略。此外,静态图自动并行架构基于 PIR 完成了全面的升级,底层的基础组件、核心模块、并行策略和性能优化策略均统一基于扩展的 PIR `DistDialect`进行实现,进一步增强了自动并行的动静一致性,并在 Llama 系列模型上性能达到了持平甚至领先手动并行方式的水平。 + +### 新特性 + +- 新增`paddle.distributed.parallel`接口,支持在模型组网外配置常见并行策略,简化分布式开发流程。[#69004](https://github.com/PaddlePaddle/Paddle/pull/69004), [#69033](https://github.com/PaddlePaddle/Paddle/pull/69033), [#69077](https://github.com/PaddlePaddle/Paddle/pull/69077), [#69136](https://github.com/PaddlePaddle/Paddle/pull/69136), [#69169](https://github.com/PaddlePaddle/Paddle/pull/69169), [#69212](https://github.com/PaddlePaddle/Paddle/pull/69212), [#69217](https://github.com/PaddlePaddle/Paddle/pull/69217), [#69283](https://github.com/PaddlePaddle/Paddle/pull/69283), [#69288](https://github.com/PaddlePaddle/Paddle/pull/69288), [#69326](https://github.com/PaddlePaddle/Paddle/pull/69326), [#69365](https://github.com/PaddlePaddle/Paddle/pull/69365), [#69384](https://github.com/PaddlePaddle/Paddle/pull/69384), [#69426](https://github.com/PaddlePaddle/Paddle/pull/69426), [#69443](https://github.com/PaddlePaddle/Paddle/pull/69443), [#69462](https://github.com/PaddlePaddle/Paddle/pull/69462), [#69492](https://github.com/PaddlePaddle/Paddle/pull/69492), [#69628](https://github.com/PaddlePaddle/Paddle/pull/69628), [#69677](https://github.com/PaddlePaddle/Paddle/pull/69677), [#69697](https://github.com/PaddlePaddle/Paddle/pull/69697), [#69776](https://github.com/PaddlePaddle/Paddle/pull/69776), [#69896](https://github.com/PaddlePaddle/Paddle/pull/69896), [#70138](https://github.com/PaddlePaddle/Paddle/pull/70138), [#70182](https://github.com/PaddlePaddle/Paddle/pull/70182), [#70539](https://github.com/PaddlePaddle/Paddle/pull/70539), [#71116](https://github.com/PaddlePaddle/Paddle/pull/71116), [#71210](https://github.com/PaddlePaddle/Paddle/pull/71210) +- 面向纯文稀疏场景支持 MoE 专家并行,实现专家并行变 mesh 切分转换机制并支持自动调用 all2all 通信。[#66462](https://github.com/PaddlePaddle/Paddle/pull/66462), [#66750](https://github.com/PaddlePaddle/Paddle/pull/66750), [#68004](https://github.com/PaddlePaddle/Paddle/pull/68004), [#68053](https://github.com/PaddlePaddle/Paddle/pull/68053), [#68187](https://github.com/PaddlePaddle/Paddle/pull/68187), [#68477](https://github.com/PaddlePaddle/Paddle/pull/68477), [#69098](https://github.com/PaddlePaddle/Paddle/pull/69098), [#69262](https://github.com/PaddlePaddle/Paddle/pull/69262), [#69296](https://github.com/PaddlePaddle/Paddle/pull/69296), [#70715](https://github.com/PaddlePaddle/Paddle/pull/70715), [#71292](https://github.com/PaddlePaddle/Paddle/pull/71292), [#71320](https://github.com/PaddlePaddle/Paddle/pull/71320) +- 为了满足极致手工优化场景下用户自行管理切分状态和通信操作的需求,同时解决部分非 SPMD 场景下无法使用张量切分语法的问题,我们新增了`LocalLayer`接口,支持自动并行和手动并行混合组网。[#70519](https://github.com/PaddlePaddle/Paddle/pull/70519), [#70525](https://github.com/PaddlePaddle/Paddle/pull/70525), [#70600](https://github.com/PaddlePaddle/Paddle/pull/70600), [#71232](https://github.com/PaddlePaddle/Paddle/pull/71232), [#71264](https://github.com/PaddlePaddle/Paddle/pull/71264), [#71373](https://github.com/PaddlePaddle/Paddle/pull/71373) +- 为了让用户可以使用国产硬件运行自动并行程序,完成了对昆仑芯片的适配,其它芯片的支持也在进行中。[#70997](https://github.com/PaddlePaddle/Paddle/pull/70997), [#71126](https://github.com/PaddlePaddle/Paddle/pull/71126), [#71229](https://github.com/PaddlePaddle/Paddle/pull/71229), [#71289](https://github.com/PaddlePaddle/Paddle/pull/71289), [#71425](https://github.com/PaddlePaddle/Paddle/pull/71425), [#71500](https://github.com/PaddlePaddle/Paddle/pull/71500) +- 针对数据维度无法整除设备维度的情况,支持了非均衡的切分推导和切分转换。[#66103](https://github.com/PaddlePaddle/Paddle/pull/66103), [#67756](https://github.com/PaddlePaddle/Paddle/pull/67756), [#69265](https://github.com/PaddlePaddle/Paddle/pull/69265), [#70072](https://github.com/PaddlePaddle/Paddle/pull/70072) +- 对 shard_dataloader 功能进行了升级,支持通过`batch_sampler`设置梯度累加步数,同时支持模型多输入的场景。[#65325](https://github.com/PaddlePaddle/Paddle/pull/65325), [#70659](https://github.com/PaddlePaddle/Paddle/pull/70659) +- 对参数保存和加载功能进行了升级,支持参数异步存储、支持动态图和静态图互相加载`master_weight`、同时支持参数版本控制和 offload 功能。[#66858](https://github.com/PaddlePaddle/Paddle/pull/66858), [#67427](https://github.com/PaddlePaddle/Paddle/pull/67427), [#70105](https://github.com/PaddlePaddle/Paddle/pull/70105), [#70639](https://github.com/PaddlePaddle/Paddle/pull/70639) +- 为了满足用户对含有`PyLayer`的组网进行动转静的需求,在静态图模式下对`PyLayer`进行了支持,允许在`PyLayer`内部运行分布式张量。[#67326](https://github.com/PaddlePaddle/Paddle/pull/67326), [#68190](https://github.com/PaddlePaddle/Paddle/pull/68190), [#69089](https://github.com/PaddlePaddle/Paddle/pull/69089), [#70831](https://github.com/PaddlePaddle/Paddle/pull/70831) +- 为了解决数据流输入格式与模型动转静实际需要的`input_spec`不一致导致无法正确动转静的问题,对动转静接口支持了用户自定义`input_spec`功能,允许用户自行传入需要的`input_spec`。[#69183](https://github.com/PaddlePaddle/Paddle/pull/69183) +- 针对混合并行场景,对梯度裁剪策略进行了适配和支持。[#65259](https://github.com/PaddlePaddle/Paddle/pull/65259), [#65928](https://github.com/PaddlePaddle/Paddle/pull/65928), [#69287](https://github.com/PaddlePaddle/Paddle/pull/69287), [#69760](https://github.com/PaddlePaddle/Paddle/pull/69760), [#71421](https://github.com/PaddlePaddle/Paddle/pull/71421) +- 针对模型层数不整除设备数的场景,支持非均衡流水并行策略,允许用户在不同流水阶段切分数量不同的网络层。[#69728](https://github.com/PaddlePaddle/Paddle/pull/69728), [#70164](https://github.com/PaddlePaddle/Paddle/pull/70164), [#70230](https://github.com/PaddlePaddle/Paddle/pull/70230) +- 新增`set_mesh`和`get_mesh`接口,支持用户方便地设置和获取全局 mesh。[#69999](https://github.com/PaddlePaddle/Paddle/pull/69999) +- 新增自动并行和手动并行精度对齐开关,方便将已有的手动并行模型改写成自动并行后验证精度正确性。[#67681](https://github.com/PaddlePaddle/Paddle/pull/67681) + +### 功能改进 + +对于算子切分推导规则进行完善和优化 + +- 新增`add_n`、`split`和`softmax_grad`算子切分推导规则。[#65606](https://github.com/PaddlePaddle/Paddle/pull/65606), [#69439](https://github.com/PaddlePaddle/Paddle/pull/69439) +- 新增`assign`和`embedding_grad`算子切分推导规则。[#67457](https://github.com/PaddlePaddle/Paddle/pull/67457) +- 新增`clip`算子切分推导规则。[#70632](https://github.com/PaddlePaddle/Paddle/pull/70632) +- 新增`dist_stack`和`gather_nd`算子切分推导规则。[#65426](https://github.com/PaddlePaddle/Paddle/pull/65426) +- 新增`dropout`算子切分推导规则。[#70216](https://github.com/PaddlePaddle/Paddle/pull/70216) +- 新增`fused_dropout_add`算子切分推导规则。[#67722](https://github.com/PaddlePaddle/Paddle/pull/67722) +- 新增`fast_ln`自定义算子切分推导规则。[#68148](https://github.com/PaddlePaddle/Paddle/pull/68148) +- 新增`greater_equal`和`less_equal`算子切分推导规则。[#68868](https://github.com/PaddlePaddle/Paddle/pull/68868) +- 新增`greater_than`和`less_than`算子切分推导规则。[#68133](https://github.com/PaddlePaddle/Paddle/pull/68133) +- 新增`if`算子切分推导规则。[#69357](https://github.com/PaddlePaddle/Paddle/pull/69357) +- 新增`logical_and`、`logical_not`、`logical_or`和`logical_xor`算子切分推导规则。[#67840](https://github.com/PaddlePaddle/Paddle/pull/67840) +- 新增`logsumexp`算子切分推导规则。[#67840](https://github.com/PaddlePaddle/Paddle/pull/67840) +- 新增`non_zero`算子切分推导规则。[#67996](https://github.com/PaddlePaddle/Paddle/pull/67996) +- 新增`pad`算子切分推导规则。[#68304](https://github.com/PaddlePaddle/Paddle/pull/68304) +- 新增`p_norm`算子切分推导规则。[#68317](https://github.com/PaddlePaddle/Paddle/pull/68317) +- 新增`scatter_nd`算子切分推导规则。[#67980](https://github.com/PaddlePaddle/Paddle/pull/67980) +- 新增`sigmoid`算子切分推导规则。[#71092](https://github.com/PaddlePaddle/Paddle/pull/71092) + +静态图自动并行架构基于 PIR 升级 + +- 混合精度训练(AMP)升级。[#65089](https://github.com/PaddlePaddle/Paddle/pull/65089), [#65892](https://github.com/PaddlePaddle/Paddle/pull/65892), [#66418](https://github.com/PaddlePaddle/Paddle/pull/66418), [#66674](https://github.com/PaddlePaddle/Paddle/pull/66674), [#68545](https://github.com/PaddlePaddle/Paddle/pull/68545) +- 重计算策略升级。[#69681](https://github.com/PaddlePaddle/Paddle/pull/69681), [#70064](https://github.com/PaddlePaddle/Paddle/pull/70064) +- 参数切片并行策略升级。[#63542](https://github.com/PaddlePaddle/Paddle/pull/63542), [#67748](https://github.com/PaddlePaddle/Paddle/pull/67748), [#68288](https://github.com/PaddlePaddle/Paddle/pull/68288), [#68314](https://github.com/PaddlePaddle/Paddle/pull/68314), [#69059](https://github.com/PaddlePaddle/Paddle/pull/69059), [#71167](https://github.com/PaddlePaddle/Paddle/pull/71167) +- 流水并行策略升级。[#66810](https://github.com/PaddlePaddle/Paddle/pull/66810), [#67174](https://github.com/PaddlePaddle/Paddle/pull/67174), [#67522](https://github.com/PaddlePaddle/Paddle/pull/67522), [#68141](https://github.com/PaddlePaddle/Paddle/pull/68141), [#68742](https://github.com/PaddlePaddle/Paddle/pull/68742), [#68962](https://github.com/PaddlePaddle/Paddle/pull/68962), [#69052](https://github.com/PaddlePaddle/Paddle/pull/69052), [#69201](https://github.com/PaddlePaddle/Paddle/pull/69201), [#69244](https://github.com/PaddlePaddle/Paddle/pull/69244), [#69578](https://github.com/PaddlePaddle/Paddle/pull/69578), [#69584](https://github.com/PaddlePaddle/Paddle/pull/69584), [#69654](https://github.com/PaddlePaddle/Paddle/pull/69654), [#69799](https://github.com/PaddlePaddle/Paddle/pull/69799), [#69894](https://github.com/PaddlePaddle/Paddle/pull/69894), [#70360](https://github.com/PaddlePaddle/Paddle/pull/70360), [#70615](https://github.com/PaddlePaddle/Paddle/pull/70615) +- 梯度累加策略升级。[#66641](https://github.com/PaddlePaddle/Paddle/pull/66641), [#67254](https://github.com/PaddlePaddle/Paddle/pull/67254), [#67907](https://github.com/PaddlePaddle/Paddle/pull/67907), [#68391](https://github.com/PaddlePaddle/Paddle/pull/68391), [#68460](https://github.com/PaddlePaddle/Paddle/pull/68460), [#68472](https://github.com/PaddlePaddle/Paddle/pull/68472), [#68664](https://github.com/PaddlePaddle/Paddle/pull/68664), [#68727](https://github.com/PaddlePaddle/Paddle/pull/68727), [#69171](https://github.com/PaddlePaddle/Paddle/pull/69171), [#69805](https://github.com/PaddlePaddle/Paddle/pull/69805) +- 算子融合策略升级。[#68087](https://github.com/PaddlePaddle/Paddle/pull/68087), [#68207](https://github.com/PaddlePaddle/Paddle/pull/68207), [#68383](https://github.com/PaddlePaddle/Paddle/pull/68383), [#68623](https://github.com/PaddlePaddle/Paddle/pull/68623), [#68650](https://github.com/PaddlePaddle/Paddle/pull/68650), [#68736](https://github.com/PaddlePaddle/Paddle/pull/68736), [#69103](https://github.com/PaddlePaddle/Paddle/pull/69103), [#70889](https://github.com/PaddlePaddle/Paddle/pull/70889) +- `tensor_fusion`优化策略升级。[#66130](https://github.com/PaddlePaddle/Paddle/pull/66130), [#68475](https://github.com/PaddlePaddle/Paddle/pull/68475), [#69243](https://github.com/PaddlePaddle/Paddle/pull/69243), [#69560](https://github.com/PaddlePaddle/Paddle/pull/69560), [#69823](https://github.com/PaddlePaddle/Paddle/pull/69823), [#70195](https://github.com/PaddlePaddle/Paddle/pull/70195), [#70309](https://github.com/PaddlePaddle/Paddle/pull/70309), [#70363](https://github.com/PaddlePaddle/Paddle/pull/70363), [#70869](https://github.com/PaddlePaddle/Paddle/pull/70869) +- 张量并行优化策略升级。[#68182](https://github.com/PaddlePaddle/Paddle/pull/68182), [#68389](https://github.com/PaddlePaddle/Paddle/pull/68389) +- 自定义算子切分推导机制升级。[#67614](https://github.com/PaddlePaddle/Paddle/pull/67614) +- 参数保存和加载机制升级。[#66416](https://github.com/PaddlePaddle/Paddle/pull/66416), [#67045](https://github.com/PaddlePaddle/Paddle/pull/67045), [#67369](https://github.com/PaddlePaddle/Paddle/pull/67369), [#68203](https://github.com/PaddlePaddle/Paddle/pull/68203) +- 计算图编译时间优化。[#68796](https://github.com/PaddlePaddle/Paddle/pull/68796) + +### bug 修复 + +- 修复切分推导机制及若干算子的切分推导规则 bug。[#65702](https://github.com/PaddlePaddle/Paddle/pull/65702), [#65835](https://github.com/PaddlePaddle/Paddle/pull/65835), [#66098](https://github.com/PaddlePaddle/Paddle/pull/66098), [#66955](https://github.com/PaddlePaddle/Paddle/pull/66955), [#67052](https://github.com/PaddlePaddle/Paddle/pull/67052), [#67059](https://github.com/PaddlePaddle/Paddle/pull/67059), [#67101](https://github.com/PaddlePaddle/Paddle/pull/67101), [#67283](https://github.com/PaddlePaddle/Paddle/pull/67283), [#67729](https://github.com/PaddlePaddle/Paddle/pull/67729), [#67996](https://github.com/PaddlePaddle/Paddle/pull/67996), [#68413](https://github.com/PaddlePaddle/Paddle/pull/68413), [#68455](https://github.com/PaddlePaddle/Paddle/pull/68455), [#68533](https://github.com/PaddlePaddle/Paddle/pull/68533), [#68976](https://github.com/PaddlePaddle/Paddle/pull/68976), [#68977](https://github.com/PaddlePaddle/Paddle/pull/68977), [#69027](https://github.com/PaddlePaddle/Paddle/pull/69027), [#69203](https://github.com/PaddlePaddle/Paddle/pull/69203), [#69223](https://github.com/PaddlePaddle/Paddle/pull/69223), [#69862](https://github.com/PaddlePaddle/Paddle/pull/69862), [#69991](https://github.com/PaddlePaddle/Paddle/pull/69991), [#70100](https://github.com/PaddlePaddle/Paddle/pull/70100), [#70624](https://github.com/PaddlePaddle/Paddle/pull/70624), [#71024](https://github.com/PaddlePaddle/Paddle/pull/71024), [#71152](https://github.com/PaddlePaddle/Paddle/pull/71152), [#71214](https://github.com/PaddlePaddle/Paddle/pull/71214), [#71253](https://github.com/PaddlePaddle/Paddle/pull/71253), [#71388](https://github.com/PaddlePaddle/Paddle/pull/71388) +- 修复切分转换机制的若干 bug。[#65060](https://github.com/PaddlePaddle/Paddle/pull/65060), [#65820](https://github.com/PaddlePaddle/Paddle/pull/65820), [#67630](https://github.com/PaddlePaddle/Paddle/pull/67630), [#67809](https://github.com/PaddlePaddle/Paddle/pull/67809), [#68115](https://github.com/PaddlePaddle/Paddle/pull/68115), [#68468](https://github.com/PaddlePaddle/Paddle/pull/68468), [#70023](https://github.com/PaddlePaddle/Paddle/pull/70023) +- 修复参数切片并行中`shard_degree`推导错误的 bug。[#68781](https://github.com/PaddlePaddle/Paddle/pull/68781), [#69214](https://github.com/PaddlePaddle/Paddle/pull/69214) +- 修复`shard_dataloader`动态图和静态图结果不一致以及切分 dict 类型数据、自定义`sampler`场景等场景下的问题。[#65262](https://github.com/PaddlePaddle/Paddle/pull/65262), [#66096](https://github.com/PaddlePaddle/Paddle/pull/66096), [#66882](https://github.com/PaddlePaddle/Paddle/pull/66882), [#69620](https://github.com/PaddlePaddle/Paddle/pull/69620) +- 修复`recompute`设置`use_reentrant=false`时和参数切分不兼容的 bug。[#65188](https://github.com/PaddlePaddle/Paddle/pull/65188) +- 修复参数加载和保存功能的 bug。[#66266](https://github.com/PaddlePaddle/Paddle/pull/66266), [#69764](https://github.com/PaddlePaddle/Paddle/pull/69764) +- 修复`Conv2D`、`fill_constant`、`flash_attn_grad`、`reduce_scatter`、`if`、`tuple_push`和`tuple_pop`等算子的 bug。[#67587](https://github.com/PaddlePaddle/Paddle/pull/67587), [#68008](https://github.com/PaddlePaddle/Paddle/pull/68008), [#68586](https://github.com/PaddlePaddle/Paddle/pull/68586), [#68589](https://github.com/PaddlePaddle/Paddle/pull/68589), [#69519](https://github.com/PaddlePaddle/Paddle/pull/69519), [#70207](https://github.com/PaddlePaddle/Paddle/pull/70207) +- 修复`reduce_scatter`、`p_send`、`p_recv`等通信算子的 bug。[#67386](https://github.com/PaddlePaddle/Paddle/pull/67386), [#71433](https://github.com/PaddlePaddle/Paddle/pull/71433) +- 修复张量类型提升的 bug。[#66541](https://github.com/PaddlePaddle/Paddle/pull/66541), [#68342](https://github.com/PaddlePaddle/Paddle/pull/68342) +- 修复在部分卡上未初始化的分布式张量转 numpy 时自动分配显存的 bug。[#66361](https://github.com/PaddlePaddle/Paddle/pull/66361) +- 修复非切分张量调用 to_tensor 时触发数据拷贝的 bug。[#67169](https://github.com/PaddlePaddle/Paddle/pull/67169) +- 修复`scaler`参数切分的 bug。[#68289](https://github.com/PaddlePaddle/Paddle/pull/68289) +- 修复`enable_delay_scale_loss`精度问题。[#68525](https://github.com/PaddlePaddle/Paddle/pull/68525) +- 修复通信组创建顺序不同导致的 hang 问题。[#68847](https://github.com/PaddlePaddle/Paddle/pull/68847) +- 修复静态图场景下`op_role`设置错误的 bug。[#67850](https://github.com/PaddlePaddle/Paddle/pull/67850), [#67986](https://github.com/PaddlePaddle/Paddle/pull/67986), [#68156](https://github.com/PaddlePaddle/Paddle/pull/68156) +- 修复静态图下无法切分随机数算子输出变量的 bug。[#67589](https://github.com/PaddlePaddle/Paddle/pull/67589), [#67750](https://github.com/PaddlePaddle/Paddle/pull/67750), [#68067](https://github.com/PaddlePaddle/Paddle/pull/68067) +- 修复静态图下计算图 cache 机制失效的 bug。[#68488](https://github.com/PaddlePaddle/Paddle/pull/68488) +- 修复`paddle.distributed.to_distributed`索引越界的 bug。[#70174](https://github.com/PaddlePaddle/Paddle/pull/70174) +- 修复流水并行可视化工具的 bug。[#71386](https://github.com/PaddlePaddle/Paddle/pull/71386) + +## 5. 算子机制 + +算子相关 PR,包括组合算子拆分、新硬件适配算子 kernel、稀疏算子运算、旧 IR 算子退场等工作,为 PIR 适配编译器、多硬件并取得性能优势奠定了基础;规范了算子体系优化了代码结构,减少了技术债,并提升了可维护性。 + +### 新特性 + +- 支持组合算子拆分。 [#65148](https://github.com/PaddlePaddle/Paddle/pull/65148), [#65007](https://github.com/PaddlePaddle/Paddle/pull/65007), [#65482](https://github.com/PaddlePaddle/Paddle/pull/65482), [#65006](https://github.com/PaddlePaddle/Paddle/pull/65006), [#65692](https://github.com/PaddlePaddle/Paddle/pull/65692), [#65961](https://github.com/PaddlePaddle/Paddle/pull/65961), [#65968](https://github.com/PaddlePaddle/Paddle/pull/65968), [#65967](https://github.com/PaddlePaddle/Paddle/pull/65967), [#66510](https://github.com/PaddlePaddle/Paddle/pull/66510), [#66795](https://github.com/PaddlePaddle/Paddle/pull/66795), [#66835](https://github.com/PaddlePaddle/Paddle/pull/66835), [#67151](https://github.com/PaddlePaddle/Paddle/pull/67151), [#67342](https://github.com/PaddlePaddle/Paddle/pull/67342), [#67481](https://github.com/PaddlePaddle/Paddle/pull/67481), [#67502](https://github.com/PaddlePaddle/Paddle/pull/67502), [#67606](https://github.com/PaddlePaddle/Paddle/pull/67606), [#67757](https://github.com/PaddlePaddle/Paddle/pull/67757), [#67775](https://github.com/PaddlePaddle/Paddle/pull/67775), [#67891](https://github.com/PaddlePaddle/Paddle/pull/67891), [#67790](https://github.com/PaddlePaddle/Paddle/pull/67790), [#67965](https://github.com/PaddlePaddle/Paddle/pull/67965), [#67968](https://github.com/PaddlePaddle/Paddle/pull/67968), [#68168](https://github.com/PaddlePaddle/Paddle/pull/68168), [#68125](https://github.com/PaddlePaddle/Paddle/pull/68125), [#68228](https://github.com/PaddlePaddle/Paddle/pull/68228), [#68295](https://github.com/PaddlePaddle/Paddle/pull/68295), [#68353](https://github.com/PaddlePaddle/Paddle/pull/68353), [#68357](https://github.com/PaddlePaddle/Paddle/pull/68357), [#68827](https://github.com/PaddlePaddle/Paddle/pull/68827), [#68834](https://github.com/PaddlePaddle/Paddle/pull/68834), [#69239](https://github.com/PaddlePaddle/Paddle/pull/69239), [#68817](https://github.com/PaddlePaddle/Paddle/pull/68817), [#69108](https://github.com/PaddlePaddle/Paddle/pull/69108), [#69373](https://github.com/PaddlePaddle/Paddle/pull/69373), [#69372](https://github.com/PaddlePaddle/Paddle/pull/69372), [#68829](https://github.com/PaddlePaddle/Paddle/pull/68829), [#69684](https://github.com/PaddlePaddle/Paddle/pull/69684), [#68818](https://github.com/PaddlePaddle/Paddle/pull/68818), [#68835](https://github.com/PaddlePaddle/Paddle/pull/68835), [#69838](https://github.com/PaddlePaddle/Paddle/pull/69838), [#69998](https://github.com/PaddlePaddle/Paddle/pull/69998), [#69675](https://github.com/PaddlePaddle/Paddle/pull/69675), [#70367](https://github.com/PaddlePaddle/Paddle/pull/70367), [#70080](https://github.com/PaddlePaddle/Paddle/pull/70080), [#71352](https://github.com/PaddlePaddle/Paddle/pull/71352), [#66450](https://github.com/PaddlePaddle/Paddle/pull/66450), [#67593](https://github.com/PaddlePaddle/Paddle/pull/67593), [#67988](https://github.com/PaddlePaddle/Paddle/pull/67988), [#68346](https://github.com/PaddlePaddle/Paddle/pull/68346), [#68399](https://github.com/PaddlePaddle/Paddle/pull/68399), [#68319](https://github.com/PaddlePaddle/Paddle/pull/68319), [#68485](https://github.com/PaddlePaddle/Paddle/pull/68485), [#68961](https://github.com/PaddlePaddle/Paddle/pull/68961), [#68575](https://github.com/PaddlePaddle/Paddle/pull/68575) +- PIR 支持 Pylayer。 [#69674](https://github.com/PaddlePaddle/Paddle/pull/69674), [#70375](https://github.com/PaddlePaddle/Paddle/pull/70375) +- 支持 XPU 相关算子计算。 [#65684](https://github.com/PaddlePaddle/Paddle/pull/65684), [#65976](https://github.com/PaddlePaddle/Paddle/pull/65976), [#68497](https://github.com/PaddlePaddle/Paddle/pull/68497) +- PIR 支持稀疏算子。 [#62663](https://github.com/PaddlePaddle/Paddle/pull/62663), [#67885](https://github.com/PaddlePaddle/Paddle/pull/67885), [#67976](https://github.com/PaddlePaddle/Paddle/pull/67976), [#68261](https://github.com/PaddlePaddle/Paddle/pull/68261), [#68326](https://github.com/PaddlePaddle/Paddle/pull/68326) +- 支持手动 Recompute。 [#65879](https://github.com/PaddlePaddle/Paddle/pull/65879) +- 实现 kernel 并注册算子。 [#63130](https://github.com/PaddlePaddle/Paddle/pull/63130) +- 支持 Custom Op。 [#68824](https://github.com/PaddlePaddle/Paddle/pull/68824), [#68748](https://github.com/PaddlePaddle/Paddle/pull/68748) +- 添加 acos 的动态图二阶反向组合。 [#70409](https://github.com/PaddlePaddle/Paddle/pull/70409) +- 支持 0-size tensor 的初始化和计算。 [#70504](https://github.com/PaddlePaddle/Paddle/pull/70504) -## 4.自动并行架构 -为了进一步增强自动并行(Auto Parallel)架构在大模型训练场景的可用性,飞桨完善了动-静态图自动并行的功能,包括新增 Sharding、interleaved pipeline 等并行策略,支持 lazy 初始化参数,新增和完善部分算子的切分推导规则等,并在多个主流大语言模型中全面验证了自动并行架构。同时,为打造飞桨全新 3.0 架构,静态图自动并行架构基于新一代中间表示 PIR 进行了全面升级,扩展实现了 DistDialect,在计算图表示中原生支持了分布式属性(DistAttr)和分布式张量(DistTensor),并打通了静态图自动并行全流程,进一步增强了自动并行的动静统一和飞桨架构的统一性。最后,新增和完善了多项性能优化技术,包括 zero bubble pipeline 调度策略等,在 Llama-2 13B/70B 等典型大模型上实现端到端训练性能持平或领先手动并行方式。 +### Bug 修复 -### 功能完善 -- 新增 dtensor_from_local 接口,用于从切分后的局部张量创建 DistTensor(与之对应的,shard_tensor 是从切分前的全局张量创建 DistTensor)。[#60206](https://github.com/PaddlePaddle/Paddle/pull/60206) -- 新增 unshard_tensor 接口,用于将 DistTensor 转为全局张量,该接口与 shard_tensor 是互逆操作。[#60272](https://github.com/PaddlePaddle/Paddle/pull/60272) -- 为减少训练时的显存占用,新增 Sharding 策略,包括 stage1,stage2 和 stage3。[#61926](https://github.com/PaddlePaddle/Paddle/pull/61926), [#62711](https://github.com/PaddlePaddle/Paddle/pull/62711), [#62486](https://github.com/PaddlePaddle/Paddle/pull/62486), [#62230](https://github.com/PaddlePaddle/Paddle/pull/62230) -- 为解决先初始化参数再切分参数时可能出现的显存不足问题,新增自动并行参数 LazyInit 功能,支持先切分参数,再初始化参数。[#60316](https://github.com/PaddlePaddle/Paddle/pull/60316), [#60441](https://github.com/PaddlePaddle/Paddle/pull/60441), [#60563](https://github.com/PaddlePaddle/Paddle/pull/60563), [#61792](https://github.com/PaddlePaddle/Paddle/pull/61792) -- 为减少流水线并行的 bubble,新增 interleaved pipeline 并行策略,同时支持通过配置的方式自动将用户组网的 pipeline 并行自动转为 interleaved pipeline 并行,让用户无需在组网中进行复杂的标记。[#59751](https://github.com/PaddlePaddle/Paddle/pull/59751), [#60050](https://github.com/PaddlePaddle/Paddle/pull/60050), [#60467](https://github.com/PaddlePaddle/Paddle/pull/60467), [#60868](https://github.com/PaddlePaddle/Paddle/pull/60868), [#60187](https://github.com/PaddlePaddle/Paddle/pull/60187), [#62884](https://github.com/PaddlePaddle/Paddle/pull/62884), [#60560](https://github.com/PaddlePaddle/Paddle/pull/60560), [#61541](https://github.com/PaddlePaddle/Paddle/pull/61541) -- 新增 stack, gather, scatter_grad, cumsum, unbind, swiglu, fused_linear_param_grad 等算子的切分推导规则,完善和优化 fused_rope, reshape, flatten, fused_rms_norm, slice, tile, flash_attn, cross_entropy 等算子切分推导规则实现,解决在部分模型组网场景中不兼容的问题。[#62720](https://github.com/PaddlePaddle/Paddle/pull/62720), [#64202](https://github.com/PaddlePaddle/Paddle/pull/64202), [#63361](https://github.com/PaddlePaddle/Paddle/pull/63361), [#63290](https://github.com/PaddlePaddle/Paddle/pull/63290), [#61460](https://github.com/PaddlePaddle/Paddle/pull/61460), [#59986](https://github.com/PaddlePaddle/Paddle/pull/59986), [#61184](https://github.com/PaddlePaddle/Paddle/pull/61184), [#60144](https://github.com/PaddlePaddle/Paddle/pull/60144), [#62525](https://github.com/PaddlePaddle/Paddle/pull/62525), [#62053](https://github.com/PaddlePaddle/Paddle/pull/62053), [#60709](https://github.com/PaddlePaddle/Paddle/pull/60709), [#60111](https://github.com/PaddlePaddle/Paddle/pull/60111), [#63681](https://github.com/PaddlePaddle/Paddle/pull/63681), [#62180](https://github.com/PaddlePaddle/Paddle/pull/62180), [#60794](https://github.com/PaddlePaddle/Paddle/pull/60794), [#60632](https://github.com/PaddlePaddle/Paddle/pull/60632), [#62439](https://github.com/PaddlePaddle/Paddle/pull/62439) -- 完善分布式 checkpoint 存储和加载功能,支持 master_weights 存储,修复随机挂问题。[#60027](https://github.com/PaddlePaddle/Paddle/pull/60027), [#59872](https://github.com/PaddlePaddle/Paddle/pull/59872) -- 为支持任意 shape 张量的自动并行,新增支持张量非均匀切分特性。[#62611](https://github.com/PaddlePaddle/Paddle/pull/62611), [#61432](https://github.com/PaddlePaddle/Paddle/pull/61432) -- 为支持用户在自动并行组网中使用自定义算子,支持用户在框架外注册自定义该类算子的切分推导规则。 [#60509](https://github.com/PaddlePaddle/Paddle/pull/60509) -- 完善切分转换规则,支持从任意状态转为 replicate 以及从 replicate 状态转换为任意状态。[#60281](https://github.com/PaddlePaddle/Paddle/pull/60281), [#59869](https://github.com/PaddlePaddle/Paddle/pull/59869) -- 新增 MoE 专家并行策略(experimental),目前仅支持动态图自动并行。[#63904](https://github.com/PaddlePaddle/Paddle/pull/63904) -- 修复自动并行与动态图执行、动转静等流程适配的部分问题。[#60214](https://github.com/PaddlePaddle/Paddle/pull/60214), [#60546](https://github.com/PaddlePaddle/Paddle/pull/60546), [#62082](https://github.com/PaddlePaddle/Paddle/pull/62082), [#61313](https://github.com/PaddlePaddle/Paddle/pull/61313), [#61840](https://github.com/PaddlePaddle/Paddle/pull/61840), [#60614](https://github.com/PaddlePaddle/Paddle/pull/60614), [#60234](https://github.com/PaddlePaddle/Paddle/pull/60234), [#64813](https://github.com/PaddlePaddle/Paddle/pull/64813), [#61606](https://github.com/PaddlePaddle/Paddle/pull/61606), [#63405](https://github.com/PaddlePaddle/Paddle/pull/63405), [#64334](https://github.com/PaddlePaddle/Paddle/pull/64334), [#60504](https://github.com/PaddlePaddle/Paddle/pull/60504) +- 修复组合算子相关 Bug。 [#70250](https://github.com/PaddlePaddle/Paddle/pull/70250), [#67170](https://github.com/PaddlePaddle/Paddle/pull/67170), [#71218](https://github.com/PaddlePaddle/Paddle/pull/71218), [#69095](https://github.com/PaddlePaddle/Paddle/pull/69095), [#70189](https://github.com/PaddlePaddle/Paddle/pull/70189) +- 修复 XPU 相关 Bug。 [#65149](https://github.com/PaddlePaddle/Paddle/pull/65149), [#70845](https://github.com/PaddlePaddle/Paddle/pull/70845) +- 修复 shape 相关 Bug。 [#68722](https://github.com/PaddlePaddle/Paddle/pull/68722), [#70210](https://github.com/PaddlePaddle/Paddle/pull/70210), [#70492](https://github.com/PaddlePaddle/Paddle/pull/70492) +- 修复 save/load 相关 Bug。 [#69153](https://github.com/PaddlePaddle/Paddle/pull/69153) +- 修复类型相关 Bug。 [#65721](https://github.com/PaddlePaddle/Paddle/pull/65721), [#65859](https://github.com/PaddlePaddle/Paddle/pull/65859) +- 其他算子调用和执行过程中的问题修复,包括类型匹配、类型推导、参数类型支持等,。 [#65360](https://github.com/PaddlePaddle/Paddle/pull/65360), [#65024](https://github.com/PaddlePaddle/Paddle/pull/65024), [#66308](https://github.com/PaddlePaddle/Paddle/pull/66308), [#67085](https://github.com/PaddlePaddle/Paddle/pull/67085), [#67285](https://github.com/PaddlePaddle/Paddle/pull/67285), [#67076](https://github.com/PaddlePaddle/Paddle/pull/67076), [#67547](https://github.com/PaddlePaddle/Paddle/pull/67547), [#68007](https://github.com/PaddlePaddle/Paddle/pull/68007), [#68527](https://github.com/PaddlePaddle/Paddle/pull/68527), [#68549](https://github.com/PaddlePaddle/Paddle/pull/68549), [#68543](https://github.com/PaddlePaddle/Paddle/pull/68543), [#68604](https://github.com/PaddlePaddle/Paddle/pull/68604), [#68741](https://github.com/PaddlePaddle/Paddle/pull/68741), [#68859](https://github.com/PaddlePaddle/Paddle/pull/68859), [#69025](https://github.com/PaddlePaddle/Paddle/pull/69025), [#69065](https://github.com/PaddlePaddle/Paddle/pull/69065), [#69405](https://github.com/PaddlePaddle/Paddle/pull/69405), [#69688](https://github.com/PaddlePaddle/Paddle/pull/69688), [#69912](https://github.com/PaddlePaddle/Paddle/pull/69912), [#70177](https://github.com/PaddlePaddle/Paddle/pull/70177), [#70517](https://github.com/PaddlePaddle/Paddle/pull/70517), [#70596](https://github.com/PaddlePaddle/Paddle/pull/70596), [#70788](https://github.com/PaddlePaddle/Paddle/pull/70788), [#70870](https://github.com/PaddlePaddle/Paddle/pull/70870), [#71332](https://github.com/PaddlePaddle/Paddle/pull/71332), [#71454](https://github.com/PaddlePaddle/Paddle/pull/71454), [#71442](https://github.com/PaddlePaddle/Paddle/pull/71442), [#71499](https://github.com/PaddlePaddle/Paddle/pull/71499), [#67459](https://github.com/PaddlePaddle/Paddle/pull/67459), [#68470](https://github.com/PaddlePaddle/Paddle/pull/68470), [#70206](https://github.com/PaddlePaddle/Paddle/pull/70206) -### 性能优化 -- 为减少流水线并行中的 bubble,支持 backward 中参数和激活的反向计算拆分,新增 zero bubble pipeline 调度策略,提升训练性能。[#62865](https://github.com/PaddlePaddle/Paddle/pull/62865), [#62737](https://github.com/PaddlePaddle/Paddle/pull/62737), [#64534](https://github.com/PaddlePaddle/Paddle/pull/64534), -- 为提升序列并行(sequence parallel)的性能,对相关通信操作和计算操作进行 fusion,并优化冗余的 transopse 操作。[#64807](https://github.com/PaddlePaddle/Paddle/pull/64807), [#63948](https://github.com/PaddlePaddle/Paddle/pull/63948), [#64316](https://github.com/PaddlePaddle/Paddle/pull/64316), [#64119](https://github.com/PaddlePaddle/Paddle/pull/64119) -- 优化静态图自动并行图优化耗时,减少从启动训练到第一个 step 完成的延时。[#59912](https://github.com/PaddlePaddle/Paddle/pull/59912), [#61817](https://github.com/PaddlePaddle/Paddle/pull/61817), [#60022](https://github.com/PaddlePaddle/Paddle/pull/60022), [#60125](https://github.com/PaddlePaddle/Paddle/pull/60125) -- 优化混合并行场景下相关通信操作的耗时。[#62157](https://github.com/PaddlePaddle/Paddle/pull/62157), [#61622](https://github.com/PaddlePaddle/Paddle/pull/61622) -- 优化自动并行动转静下参数的的冗余显存占用。[#62746](https://github.com/PaddlePaddle/Paddle/pull/62746) -- 完善自动并行的混合精度训练功能,支持设置局部 auto_cast 和黑白名单,支持 master grad 功能,适配不同的并行策略等。[60158](https://github.com/PaddlePaddle/Paddle/pull/60158), [#59987](https://github.com/PaddlePaddle/Paddle/pull/59987), [#62629](https://github.com/PaddlePaddle/Paddle/pull/62629), [#60385](https://github.com/PaddlePaddle/Paddle/pull/60385), [#62015](https://github.com/PaddlePaddle/Paddle/pull/62015), [#60514](https://github.com/PaddlePaddle/Paddle/pull/60514), [#61221](https://github.com/PaddlePaddle/Paddle/pull/61221), [#60779](https://github.com/PaddlePaddle/Paddle/pull/60779), [#63228](https://github.com/PaddlePaddle/Paddle/pull/63228) -- 优化 type promotion 和 amp 带来的非必要的 cast,提升性能。[#63293](https://github.com/PaddlePaddle/Paddle/pull/63293), [#63228](https://github.com/PaddlePaddle/Paddle/pull/63228) - -### 静态图自动并行架构升级 -- 基于新一代中间表示 PIR,新增 DistDialect,在计算图表示中原生支持了分布式属性(DistAttr)和分布式张量(DistTensor),实现了分布式属性和张量或算子的直接绑定,使自动并行架构更简洁统一。[#63828](https://github.com/PaddlePaddle/Paddle/pull/63828), [#64299](https://github.com/PaddlePaddle/Paddle/pull/64299), [#63870](https://github.com/PaddlePaddle/Paddle/pull/63870), [#64144](https://github.com/PaddlePaddle/Paddle/pull/64144), [#62524](https://github.com/PaddlePaddle/Paddle/pull/62524), [#62630](https://github.com/PaddlePaddle/Paddle/pull/62630), [#62897](https://github.com/PaddlePaddle/Paddle/pull/62897), [#60478](https://github.com/PaddlePaddle/Paddle/pull/60478), [#60574](https://github.com/PaddlePaddle/Paddle/pull/60574), [#63876](https://github.com/PaddlePaddle/Paddle/pull/63876), [#63798](https://github.com/PaddlePaddle/Paddle/pull/63798), [#62560](https://github.com/PaddlePaddle/Paddle/pull/62560), [#63676](https://github.com/PaddlePaddle/Paddle/pull/63676) -- 完成自动并行 PIR 新架构对 shard_tensor、reshard、to_static 等 API 的适配,支持用户将动态图模型组网直接转成 PIR 静态计算图进行优化和训练。[#62945](https://github.com/PaddlePaddle/Paddle/pull/62945), [#62356](https://github.com/PaddlePaddle/Paddle/pull/62356), [#60175](https://github.com/PaddlePaddle/Paddle/pull/60175), [#62654](https://github.com/PaddlePaddle/Paddle/pull/62654), [#63347](https://github.com/PaddlePaddle/Paddle/pull/63347) -- 优化静态图自动并行的图优化编译过程,通过重构优化静半中计算图切分和通信解析两个主要过程的实现,减少静态图编译优化耗时。[#64137](https://github.com/PaddlePaddle/Paddle/pull/64137), [#62201](https://github.com/PaddlePaddle/Paddle/pull/62201), [#64143](https://github.com/PaddlePaddle/Paddle/pull/64143), [#62560](https://github.com/PaddlePaddle/Paddle/pull/62560) -- 优化静态图中切分推导规则的调用流程,实现切分推导结果在动-静态图下的一致,提升了架构的统一性和稳定性。 [#62659](https://github.com/PaddlePaddle/Paddle/pull/62659), [#62547](https://github.com/PaddlePaddle/Paddle/pull/62547), [#63117](https://github.com/PaddlePaddle/Paddle/pull/63117), [#63434](https://github.com/PaddlePaddle/Paddle/pull/63434), [#63770](https://github.com/PaddlePaddle/Paddle/pull/63770), [#64361](https://github.com/PaddlePaddle/Paddle/pull/64361), [#63073](https://github.com/PaddlePaddle/Paddle/pull/63073) -- 升级静态图中张量切分转换的实现,动-静态图下使用一致的切分转换通信规则,保障动-静态图下张量切分转换执行逻辑和结果的一致性,提升用户体验。[#62718](https://github.com/PaddlePaddle/Paddle/pull/62718), [#62694](https://github.com/PaddlePaddle/Paddle/pull/62694), [#60215](https://github.com/PaddlePaddle/Paddle/pull/60215), [#63362](https://github.com/PaddlePaddle/Paddle/pull/63362), [#63072](https://github.com/PaddlePaddle/Paddle/pull/63072), [#63962](https://github.com/PaddlePaddle/Paddle/pull/63962), [#64223](https://github.com/PaddlePaddle/Paddle/pull/64223), [#61796](https://github.com/PaddlePaddle/Paddle/pull/61796), [#64465](https://github.com/PaddlePaddle/Paddle/pull/64465), [#64623](https://github.com/PaddlePaddle/Paddle/pull/64623), [#64418](https://github.com/PaddlePaddle/Paddle/pull/64418) - -### 训练策略自动搜索和调优 -为提升训练策略自动搜索和调优工具(AutoTuner)的易用性,支持用户自定义搜索项,支持设置搜索项的优先级,支持用户配置不合法的策略组合,全面增强了运行时和运行后日志中的报错信息,支持在 NPU 设备上进行 AutoTuner。[#60101](https://github.com/PaddlePaddle/Paddle/pull/60101), [#60294](https://github.com/PaddlePaddle/Paddle/pull/60294), [#61898](https://github.com/PaddlePaddle/Paddle/pull/61898), [#60248](https://github.com/PaddlePaddle/Paddle/pull/60248), [#60417](https://github.com/PaddlePaddle/Paddle/pull/60417), [#60954](https://github.com/PaddlePaddle/Paddle/pull/60954), [#61499](https://github.com/PaddlePaddle/Paddle/pull/61499), [#62724](https://github.com/PaddlePaddle/Paddle/pull/62724), [#60954](https://github.com/PaddlePaddle/Paddle/pull/60954), [#63693](https://github.com/PaddlePaddle/Paddle/pull/63693), [#62853](https://github.com/PaddlePaddle/Paddle/pull/62853), [#62984](https://github.com/PaddlePaddle/Paddle/pull/62984) - -## 5.Cuda 训练性能优化 -本次升级从算子计算效率、分布式通信优化、显存优化等多个角度实现了大模型训练效率的提升。 +### 其他 -### 功能完善 -- FlashAttention 算子功能增强,包含支持 NVIDIA SM90 GPU 编译,支持 Group Query Attention,支持 cuDNN 接入,支持 QKV-packed 形式输入等。[#59820](https://github.com/PaddlePaddle/Paddle/pull/59820),[#60776](https://github.com/PaddlePaddle/Paddle/pull/60776),[#58680](https://github.com/PaddlePaddle/Paddle/pull/58680),[#63289](https://github.com/PaddlePaddle/Paddle/pull/63289) -- repeat_interleave 算子添加 BFloat16 数据类型的支持。[#61854](https://github.com/PaddlePaddle/Paddle/pull/61854) -- 针对 fused_scale_bias_add_relu、fused_scale_bias_relu_conv_bn、fused_dconv_drelu_dbn 等 ResNet 类模型接口参数多、算子易用性查等问题,添加了 fuse_resunit pass,支持上述算子的自动融合,实现通用性能优化。([#59771](https://github.com/PaddlePaddle/Paddle/pull/59771)) +- 优化代码风格。 [#68536](https://github.com/PaddlePaddle/Paddle/pull/68536) +- 修复拼写错误。 [#67456](https://github.com/PaddlePaddle/Paddle/pull/67456), [#66673](https://github.com/PaddlePaddle/Paddle/pull/66673), [#68702](https://github.com/PaddlePaddle/Paddle/pull/68702), [#68735](https://github.com/PaddlePaddle/Paddle/pull/68735), [#68718](https://github.com/PaddlePaddle/Paddle/pull/68718), [#70700](https://github.com/PaddlePaddle/Paddle/pull/70700), [#70682](https://github.com/PaddlePaddle/Paddle/pull/70682), [#70670](https://github.com/PaddlePaddle/Paddle/pull/70670), [#70241](https://github.com/PaddlePaddle/Paddle/pull/70241), [#69626](https://github.com/PaddlePaddle/Paddle/pull/69626), [#70051](https://github.com/PaddlePaddle/Paddle/pull/70051), [#67764](https://github.com/PaddlePaddle/Paddle/pull/67764), [#68872](https://github.com/PaddlePaddle/Paddle/pull/68872), [#70055](https://github.com/PaddlePaddle/Paddle/pull/70055), [#67954](https://github.com/PaddlePaddle/Paddle/pull/67954), [#67404](https://github.com/PaddlePaddle/Paddle/pull/67404), [#69273](https://github.com/PaddlePaddle/Paddle/pull/69273), [#66981](https://github.com/PaddlePaddle/Paddle/pull/66981), [#68145](https://github.com/PaddlePaddle/Paddle/pull/68145), [#69148](https://github.com/PaddlePaddle/Paddle/pull/69148), [#69145](https://github.com/PaddlePaddle/Paddle/pull/69145), [#69168](https://github.com/PaddlePaddle/Paddle/pull/69168), [#68940](https://github.com/PaddlePaddle/Paddle/pull/68940), [#70344](https://github.com/PaddlePaddle/Paddle/pull/70344) +- 修改接口文档。 [#69378](https://github.com/PaddlePaddle/Paddle/pull/69378) +- 替换 fluid 算子体系下的算子及参数命名。 [#69345](https://github.com/PaddlePaddle/Paddle/pull/69345), [#69382](https://github.com/PaddlePaddle/Paddle/pull/69382), [#69484](https://github.com/PaddlePaddle/Paddle/pull/69484), [#69444](https://github.com/PaddlePaddle/Paddle/pull/69444) + +### 废弃 + +- xshape 输出退场。 [#66769](https://github.com/PaddlePaddle/Paddle/pull/66769), [#67009](https://github.com/PaddlePaddle/Paddle/pull/67009), [#67152](https://github.com/PaddlePaddle/Paddle/pull/67152), [#67172](https://github.com/PaddlePaddle/Paddle/pull/67172), [#67355](https://github.com/PaddlePaddle/Paddle/pull/67355), [#67373](https://github.com/PaddlePaddle/Paddle/pull/67373), [#66089](https://github.com/PaddlePaddle/Paddle/pull/66089) +- 移除 fluid 体系下废弃的算子及其 kernel、相关单测、相关调用代码。 [#67370](https://github.com/PaddlePaddle/Paddle/pull/67370), [#67088](https://github.com/PaddlePaddle/Paddle/pull/67088), [#67324](https://github.com/PaddlePaddle/Paddle/pull/67324), [#67666](https://github.com/PaddlePaddle/Paddle/pull/67666), [#68058](https://github.com/PaddlePaddle/Paddle/pull/68058), [#68311](https://github.com/PaddlePaddle/Paddle/pull/68311), [#68358](https://github.com/PaddlePaddle/Paddle/pull/68358), [#68312](https://github.com/PaddlePaddle/Paddle/pull/68312), [#68355](https://github.com/PaddlePaddle/Paddle/pull/68355), [#67528](https://github.com/PaddlePaddle/Paddle/pull/67528), [#68316](https://github.com/PaddlePaddle/Paddle/pull/68316), [#68356](https://github.com/PaddlePaddle/Paddle/pull/68356), [#68397](https://github.com/PaddlePaddle/Paddle/pull/68397), [#68441](https://github.com/PaddlePaddle/Paddle/pull/68441), [#68417](https://github.com/PaddlePaddle/Paddle/pull/68417), [#68567](https://github.com/PaddlePaddle/Paddle/pull/68567), [#68583](https://github.com/PaddlePaddle/Paddle/pull/68583), [#68649](https://github.com/PaddlePaddle/Paddle/pull/68649), [#68331](https://github.com/PaddlePaddle/Paddle/pull/68331), [#68730](https://github.com/PaddlePaddle/Paddle/pull/68730), [#69754](https://github.com/PaddlePaddle/Paddle/pull/69754), [#69445](https://github.com/PaddlePaddle/Paddle/pull/69445), [#69921](https://github.com/PaddlePaddle/Paddle/pull/69921), [#70268](https://github.com/PaddlePaddle/Paddle/pull/70268), [#69446](https://github.com/PaddlePaddle/Paddle/pull/69446), [#69544](https://github.com/PaddlePaddle/Paddle/pull/69544), [#70272](https://github.com/PaddlePaddle/Paddle/pull/70272), [#69745](https://github.com/PaddlePaddle/Paddle/pull/69745), [#70300](https://github.com/PaddlePaddle/Paddle/pull/70300), [#70388](https://github.com/PaddlePaddle/Paddle/pull/70388), [#70421](https://github.com/PaddlePaddle/Paddle/pull/70421), [#70302](https://github.com/PaddlePaddle/Paddle/pull/70302), [#70445](https://github.com/PaddlePaddle/Paddle/pull/70445), [#69275](https://github.com/PaddlePaddle/Paddle/pull/69275), [#69081](https://github.com/PaddlePaddle/Paddle/pull/69081), [#70588](https://github.com/PaddlePaddle/Paddle/pull/70588), [#67778](https://github.com/PaddlePaddle/Paddle/pull/67778), [#67953](https://github.com/PaddlePaddle/Paddle/pull/67953), [#68093](https://github.com/PaddlePaddle/Paddle/pull/68093), [#68092](https://github.com/PaddlePaddle/Paddle/pull/68092), [#67684](https://github.com/PaddlePaddle/Paddle/pull/67684), [#69665](https://github.com/PaddlePaddle/Paddle/pull/69665), [#67915](https://github.com/PaddlePaddle/Paddle/pull/67915), [#67917](https://github.com/PaddlePaddle/Paddle/pull/67917), [#68403](https://github.com/PaddlePaddle/Paddle/pull/68403), [#68404](https://github.com/PaddlePaddle/Paddle/pull/68404), [#68969](https://github.com/PaddlePaddle/Paddle/pull/68969), [#68953](https://github.com/PaddlePaddle/Paddle/pull/68953), [#68954](https://github.com/PaddlePaddle/Paddle/pull/68954), [#68942](https://github.com/PaddlePaddle/Paddle/pull/68942), [#68950](https://github.com/PaddlePaddle/Paddle/pull/68950), [#69381](https://github.com/PaddlePaddle/Paddle/pull/69381), [#69380](https://github.com/PaddlePaddle/Paddle/pull/69380), [#69448](https://github.com/PaddlePaddle/Paddle/pull/69448), [#69680](https://github.com/PaddlePaddle/Paddle/pull/69680), [#69775](https://github.com/PaddlePaddle/Paddle/pull/69775), [#69812](https://github.com/PaddlePaddle/Paddle/pull/69812), [#69840](https://github.com/PaddlePaddle/Paddle/pull/69840), [#69828](https://github.com/PaddlePaddle/Paddle/pull/69828), [#69742](https://github.com/PaddlePaddle/Paddle/pull/69742), [#69923](https://github.com/PaddlePaddle/Paddle/pull/69923), [#69922](https://github.com/PaddlePaddle/Paddle/pull/69922), [#69904](https://github.com/PaddlePaddle/Paddle/pull/69904), [#70002](https://github.com/PaddlePaddle/Paddle/pull/70002), [#70054](https://github.com/PaddlePaddle/Paddle/pull/70054), [#70052](https://github.com/PaddlePaddle/Paddle/pull/70052), [#70053](https://github.com/PaddlePaddle/Paddle/pull/70053), [#70713](https://github.com/PaddlePaddle/Paddle/pull/70713), [#70718](https://github.com/PaddlePaddle/Paddle/pull/70718), [#70718](https://github.com/PaddlePaddle/Paddle/pull/70718), [#70717](https://github.com/PaddlePaddle/Paddle/pull/70717) +- 移除废弃 Flag。 [#70727](https://github.com/PaddlePaddle/Paddle/pull/70727), [#70726](https://github.com/PaddlePaddle/Paddle/pull/70726) +- 移除组合算子废弃 API。 [#69873](https://github.com/PaddlePaddle/Paddle/pull/69873), [#69309](https://github.com/PaddlePaddle/Paddle/pull/69309) + +### 开发者相关 + +- 支持组合算子,包括适配算子、添加 Flag、测试用例等。 [#67725](https://github.com/PaddlePaddle/Paddle/pull/67725), [#65252](https://github.com/PaddlePaddle/Paddle/pull/65252), [#67590](https://github.com/PaddlePaddle/Paddle/pull/67590), [#68076](https://github.com/PaddlePaddle/Paddle/pull/68076), [#66711](https://github.com/PaddlePaddle/Paddle/pull/66711), [#68813](https://github.com/PaddlePaddle/Paddle/pull/68813), [#68928](https://github.com/PaddlePaddle/Paddle/pull/68928), [#69054](https://github.com/PaddlePaddle/Paddle/pull/69054), [#69156](https://github.com/PaddlePaddle/Paddle/pull/69156), [#69255](https://github.com/PaddlePaddle/Paddle/pull/69255), [#69460](https://github.com/PaddlePaddle/Paddle/pull/69460), [#70270](https://github.com/PaddlePaddle/Paddle/pull/70270) +- 为算子添加单测。 [#68272](https://github.com/PaddlePaddle/Paddle/pull/68272), [#68490](https://github.com/PaddlePaddle/Paddle/pull/68490) +- 增加算子 API 别名用于 PaddleCustomDevice。 [#69526](https://github.com/PaddlePaddle/Paddle/pull/69526) +- 移动算子定义位置,使其只支持动态图。 [#69289](https://github.com/PaddlePaddle/Paddle/pull/69289) +- 标注仅前向计算算算子。 [#68580](https://github.com/PaddlePaddle/Paddle/pull/68580) +- 将 view 运算的反向算子改为复用前向算子,从而支持科学计算场景下高阶微分的需求。 [#71086](https://github.com/PaddlePaddle/Paddle/pull/71086) +- 迁移算子文件位置/修改函数命名空间/修改函数参数名等。 [#66393](https://github.com/PaddlePaddle/Paddle/pull/66393), [#67066](https://github.com/PaddlePaddle/Paddle/pull/67066), [#67012](https://github.com/PaddlePaddle/Paddle/pull/67012), [#67243](https://github.com/PaddlePaddle/Paddle/pull/67243), [#67367](https://github.com/PaddlePaddle/Paddle/pull/67367), [#67760](https://github.com/PaddlePaddle/Paddle/pull/67760), [#67242](https://github.com/PaddlePaddle/Paddle/pull/67242), [#67189](https://github.com/PaddlePaddle/Paddle/pull/67189), [#67899](https://github.com/PaddlePaddle/Paddle/pull/67899), [#67687](https://github.com/PaddlePaddle/Paddle/pull/67687), [#68035](https://github.com/PaddlePaddle/Paddle/pull/68035), [#67682](https://github.com/PaddlePaddle/Paddle/pull/67682), [#68464](https://github.com/PaddlePaddle/Paddle/pull/68464), [#68469](https://github.com/PaddlePaddle/Paddle/pull/68469), [#67900](https://github.com/PaddlePaddle/Paddle/pull/67900), [#68563](https://github.com/PaddlePaddle/Paddle/pull/68563), [#68562](https://github.com/PaddlePaddle/Paddle/pull/68562), [#68564](https://github.com/PaddlePaddle/Paddle/pull/68564), [#68479](https://github.com/PaddlePaddle/Paddle/pull/68479), [#68588](https://github.com/PaddlePaddle/Paddle/pull/68588), [#68726](https://github.com/PaddlePaddle/Paddle/pull/68726), [#68719](https://github.com/PaddlePaddle/Paddle/pull/68719), [#68767](https://github.com/PaddlePaddle/Paddle/pull/68767), [#68557](https://github.com/PaddlePaddle/Paddle/pull/68557), [#68671](https://github.com/PaddlePaddle/Paddle/pull/68671), [#68786](https://github.com/PaddlePaddle/Paddle/pull/68786), [#67948](https://github.com/PaddlePaddle/Paddle/pull/67948), [#64999](https://github.com/PaddlePaddle/Paddle/pull/64999), [#68581](https://github.com/PaddlePaddle/Paddle/pull/68581), [#68361](https://github.com/PaddlePaddle/Paddle/pull/68361), [#68656](https://github.com/PaddlePaddle/Paddle/pull/68656), [#68396](https://github.com/PaddlePaddle/Paddle/pull/68396), [#68059](https://github.com/PaddlePaddle/Paddle/pull/68059), [#68785](https://github.com/PaddlePaddle/Paddle/pull/68785), [#68665](https://github.com/PaddlePaddle/Paddle/pull/68665), [#68869](https://github.com/PaddlePaddle/Paddle/pull/68869), [#67626](https://github.com/PaddlePaddle/Paddle/pull/67626), [#68921](https://github.com/PaddlePaddle/Paddle/pull/68921), [#69268](https://github.com/PaddlePaddle/Paddle/pull/69268), [#69271](https://github.com/PaddlePaddle/Paddle/pull/69271), [#69306](https://github.com/PaddlePaddle/Paddle/pull/69306), [#69302](https://github.com/PaddlePaddle/Paddle/pull/69302), [#69341](https://github.com/PaddlePaddle/Paddle/pull/69341), [#69364](https://github.com/PaddlePaddle/Paddle/pull/69364), [#69343](https://github.com/PaddlePaddle/Paddle/pull/69343), [#69383](https://github.com/PaddlePaddle/Paddle/pull/69383), [#69415](https://github.com/PaddlePaddle/Paddle/pull/69415), [#69437](https://github.com/PaddlePaddle/Paddle/pull/69437), [#69494](https://github.com/PaddlePaddle/Paddle/pull/69494), [#69541](https://github.com/PaddlePaddle/Paddle/pull/69541), [#69543](https://github.com/PaddlePaddle/Paddle/pull/69543), [#69540](https://github.com/PaddlePaddle/Paddle/pull/69540), [#69569](https://github.com/PaddlePaddle/Paddle/pull/69569), [#69568](https://github.com/PaddlePaddle/Paddle/pull/69568), [#69621](https://github.com/PaddlePaddle/Paddle/pull/69621), [#69622](https://github.com/PaddlePaddle/Paddle/pull/69622), [#69701](https://github.com/PaddlePaddle/Paddle/pull/69701), [#69702](https://github.com/PaddlePaddle/Paddle/pull/69702), [#69704](https://github.com/PaddlePaddle/Paddle/pull/69704), [#69743](https://github.com/PaddlePaddle/Paddle/pull/69743), [#69780](https://github.com/PaddlePaddle/Paddle/pull/69780), [#69814](https://github.com/PaddlePaddle/Paddle/pull/69814), [#69822](https://github.com/PaddlePaddle/Paddle/pull/69822), [#69893](https://github.com/PaddlePaddle/Paddle/pull/69893), [#69967](https://github.com/PaddlePaddle/Paddle/pull/69967), [#69976](https://github.com/PaddlePaddle/Paddle/pull/69976), [#70011](https://github.com/PaddlePaddle/Paddle/pull/70011), [#70015](https://github.com/PaddlePaddle/Paddle/pull/70015), [#70007](https://github.com/PaddlePaddle/Paddle/pull/70007), [#70010](https://github.com/PaddlePaddle/Paddle/pull/70010), [#70346](https://github.com/PaddlePaddle/Paddle/pull/70346), [#70414](https://github.com/PaddlePaddle/Paddle/pull/70414), [#69951](https://github.com/PaddlePaddle/Paddle/pull/69951), [#70299](https://github.com/PaddlePaddle/Paddle/pull/70299), [#70441](https://github.com/PaddlePaddle/Paddle/pull/70441), [#70435](https://github.com/PaddlePaddle/Paddle/pull/70435), [#68420](https://github.com/PaddlePaddle/Paddle/pull/68420), [#70671](https://github.com/PaddlePaddle/Paddle/pull/70671), [#70705](https://github.com/PaddlePaddle/Paddle/pull/70705), [#68540](https://github.com/PaddlePaddle/Paddle/pull/68540), [#70211](https://github.com/PaddlePaddle/Paddle/pull/70211), [#67489](https://github.com/PaddlePaddle/Paddle/pull/67489), [#66927](https://github.com/PaddlePaddle/Paddle/pull/66927), [#66942](https://github.com/PaddlePaddle/Paddle/pull/66942), [#66848](https://github.com/PaddlePaddle/Paddle/pull/66848), [#66796](https://github.com/PaddlePaddle/Paddle/pull/66796), [#67036](https://github.com/PaddlePaddle/Paddle/pull/67036), [#67244](https://github.com/PaddlePaddle/Paddle/pull/67244), [#67299](https://github.com/PaddlePaddle/Paddle/pull/67299), [#67171](https://github.com/PaddlePaddle/Paddle/pull/67171), [#67293](https://github.com/PaddlePaddle/Paddle/pull/67293), [#67208](https://github.com/PaddlePaddle/Paddle/pull/67208), [#67408](https://github.com/PaddlePaddle/Paddle/pull/67408), [#67523](https://github.com/PaddlePaddle/Paddle/pull/67523), [#67689](https://github.com/PaddlePaddle/Paddle/pull/67689), [#67694](https://github.com/PaddlePaddle/Paddle/pull/67694), [#67797](https://github.com/PaddlePaddle/Paddle/pull/67797), [#67894](https://github.com/PaddlePaddle/Paddle/pull/67894), [#65969](https://github.com/PaddlePaddle/Paddle/pull/65969), [#65939](https://github.com/PaddlePaddle/Paddle/pull/65939), [#67928](https://github.com/PaddlePaddle/Paddle/pull/67928), [#68097](https://github.com/PaddlePaddle/Paddle/pull/68097), [#66744](https://github.com/PaddlePaddle/Paddle/pull/66744), [#68496](https://github.com/PaddlePaddle/Paddle/pull/68496), [#66943](https://github.com/PaddlePaddle/Paddle/pull/66943), [#68773](https://github.com/PaddlePaddle/Paddle/pull/68773), [#69272](https://github.com/PaddlePaddle/Paddle/pull/69272) +- 移动测试文件位置。 [#67564](https://github.com/PaddlePaddle/Paddle/pull/67564), [#68266](https://github.com/PaddlePaddle/Paddle/pull/68266), [#68634](https://github.com/PaddlePaddle/Paddle/pull/68634) +- xshape 输出退场相关前置修改。 [#67543](https://github.com/PaddlePaddle/Paddle/pull/67543), [#67572](https://github.com/PaddlePaddle/Paddle/pull/67572) + +### 改进 + +- 支持了更多数据类型。 [#69143](https://github.com/PaddlePaddle/Paddle/pull/69143) +- 更新 xpu 接口。 [#69800](https://github.com/PaddlePaddle/Paddle/pull/69800) +- 改进了算子打印功能。 [#69916](https://github.com/PaddlePaddle/Paddle/pull/69916) +- 升级了 normalize 操作以支持更多场景。 [#70152](https://github.com/PaddlePaddle/Paddle/pull/70152) +- 扩展了 group_norm 以处理 rank 大于 5 的情况。 [#68774](https://github.com/PaddlePaddle/Paddle/pull/68774) +- 改进了 backward_blacklist 的使用。 [#69356](https://github.com/PaddlePaddle/Paddle/pull/69356) ### 性能提升 -- 针对 Llama 类模型 SwiGLU 激活模块计算过程显存占用较大的问题,新增了 SwiGLU 融合算子,节省中间变量的显存占用,从而降低大模型训练过程显存开销,减少重计算以提升性能,Llama-70B 模型性能提升 9%。 [#61508](https://github.com/PaddlePaddle/Paddle/pull/61508) -- 针对序列并行(Sequence Parallel)过程通信占比较高的问题,实现了序列并行反向过程通信与 Matmul 计算的 overlap,节省端到端耗时,在大模型训练场景端到端性能提升 1%~2%。[#62284](https://github.com/PaddlePaddle/Paddle/pull/62284),[#63531](https://github.com/PaddlePaddle/Paddle/pull/63531) -- 针对 Sharding 反向通信后仍需要除以 nranks 导致训练速度慢的问题,支持了反向通信与除以 nranks 运算的融合,支持 ReduceScatter Average 的模式,提升大模型训练性能。[#62623](https://github.com/PaddlePaddle/Paddle/pull/62623) -- 针对张量模型并行过程输入数据广播过程导致训练速度抖动的问题,修复了数据广播过程的不必要的 CPU 和 GPU 间的同步,保证训练速度的稳定性。[#60816](https://github.com/PaddlePaddle/Paddle/pull/60816) -- 针对流水线模型并行 P2P 通信时间较长导致训练速度低下的问题,实现了 P2P 通信与前反向计算的 overlap,大模型端到端训练性能提升 2%~3%。[#61935](https://github.com/PaddlePaddle/Paddle/pull/61935),[#62051](https://github.com/PaddlePaddle/Paddle/pull/62051,[#62051](https://github.com/PaddlePaddle/Paddle/pull/62051)) -- 针对 fused_linear_param_grad_add 算子 bias 梯度计算效率低下问题,优化了 bias 梯度计算环节的计算效率,大模型端到端训练性能提升 0.2%。[#63114](https://github.com/PaddlePaddle/Paddle/pull/63114) -- 针对 Sharding 反向计算结束后参数广播过程耗时较长的问题,实现了参数广播与下一个 step 计算的 overlap,大模型端到端训练性能提升 2%以上。[#63945](https://github.com/PaddlePaddle/Paddle/pull/63945) -- 针对流水线并行训练过程梯度占用显存过高从而引入过多重计算导致训练速度慢的问题,实现了梯度动态释放技术,大模型端到端训练性能提升 3.4%。[#59739](https://github.com/PaddlePaddle/Paddle/pull/59739) -### Bug 修复 -- 修复 StreamSafeCUDAAllocator CUDA Event 资源泄露导致大模型训练降速等问题。[#64621](https://github.com/PaddlePaddle/Paddle/pull/64621) -- 修复 fused_rotary_position_embedding 算子反向计算错误的 bug。[#60217](https://github.com/PaddlePaddle/Paddle/pull/60217) -- 修复自定义算子在 AMP 场景下无法通过黑白名单控制计算精度的 bug。[#60052](https://github.com/PaddlePaddle/Paddle/pull/60052) -- 修复 add_、divide_等原生支持不同数据类型运算的算子在类型提升时发生预期外的类型提升的 bug。[#64302](https://github.com/PaddlePaddle/Paddle/pull/64302) +- 优化了 where_double_grad 算子的性能。 [#70404](https://github.com/PaddlePaddle/Paddle/pull/70404) +- 将 for range 改为 slice 加快 grad 执行速度。 [#69938](https://github.com/PaddlePaddle/Paddle/pull/69938) -## 6.分布式策略增强 -重点强化了飞桨动态图分布式计算功能体验,对 AutoTuner、流水线并行、Sharding 等并行策略做了多方面的功能改进,增强了大模型训练的灵活性;新增 Flash Attention Mask 等功能,显著降低大模型训练特别是长 sequence 训练的显存占用,提升训练性能,为大模型训练提供更强的能力支持;另外修复了若干 Bug 以及潜在的安全性风险,显著提升了系统整体稳定性。 +## 6. 框架性能优化 -### 功能优化 -- 优化了 Autotuner 的搜索空间,大幅提升了搜索的性能。[#62608](https://github.com/PaddlePaddle/Paddle/pull/62608) -- 针对流水线并行中由于在 eval 过程检查发送类型,导致训练可能出错的问题,增加训练配置,跳过流水线发送的冗余接收检查,灵活性更高、性能更好。[#63001](https://github.com/PaddlePaddle/Paddle/pull/63001) -- 在动态图流水并行中,增加了发送和接收数据的大小和类型的的检查,增加报错信息,使得鲁棒性、可调试性更好。[#59405](https://github.com/PaddlePaddle/Paddle/pull/59405) -- 支持动态图流水并行设定多个损失函数,并返回多个 loss,提升了动态图流水线的灵活性。[#63167](https://github.com/PaddlePaddle/Paddle/pull/63167) -- 在动态图流水并行中,增加流水线缓存清除配置选项,可以及时清除流水线中发送和接受的 cache,更好的支持动态 batchsize 训练。[#62277](https://github.com/PaddlePaddle/Paddle/pull/62277) -- 针对 sharding stage3 策略无法逐位对齐的问题,将无序的 set 集合换成了有序的 OrderedSet,避免了累加顺序导致的误差,修复完后可以逐位对齐。[#60085](https://github.com/PaddlePaddle/Paddle/pull/60085) -- 为了进一步降低针对序列并行中显存占用,新增重计算 allgather 的方法,减少 allgather 的 activation 的显存大小。[#64244](https://github.com/PaddlePaddle/Paddle/pull/64244) - -### 动态图新功能 -- 针对 autotuner 的搜索空间,新增了 refined recompute 的搜索维度,使得搜索结果更精准,调优模型的门槛更低。[#62430](https://github.com/PaddlePaddle/Paddle/pull/62430) -- 针对虚拟流水线并行中,需要限制训练批大小的问题,修改了流水线调度方式,解除批大小限制,支持更灵活的批大小。[#61561](https://github.com/PaddlePaddle/Paddle/pull/61561),[#60314](https://github.com/PaddlePaddle/Paddle/pull/60134) -- 针对使用 flash attention 具有 mask 时,mask 的显存占用随序列长度呈二次方复杂度、性能低的问题,使用稀疏的 mask 表达、优化 mask 的显存,显存复杂度从序列长度的二次方降低为一次方,减少了存储的访问次数,同时使用 share memory 加速访存,大幅提升性能。[#62029](https://github.com/PaddlePaddle/Paddle/pull/62029) -- 动态图 Sharding 并行策略新增完善通信和计算 overlap 功能,提升训练过程中的性能。[#60455](https://github.com/PaddlePaddle/Paddle/pull/60455) - -### 通信库功能优化 -- 增强 NCCL 通信库的功能,支持初始化时传入额外的初始化参数以支持定制的 NCCL 库的初始化。[#62193](https://github.com/PaddlePaddle/Paddle/pull/62193) -- 增加 NCCL 库路径查找功能,支持更灵活的 NCCL 库查找方式。[#62492](https://github.com/PaddlePaddle/Paddle/pull/62492) +性能优化相关 PR,包括优化算子性能、优化 kernel 表现、优化内存、优化命名空间等,给使用者带来更好的开发体验。 -### Bug 修复 -- 修复 fused\_linear\_param\_grad\_add\_kernel 算子 dbias_out 空间申请问题,同时增加梯度地址检查逻辑,使得报错信息更易调试。[#363433](https://github.com/PaddlePaddle/Paddle/pull/63433),[#64460](https://github.com/PaddlePaddle/Paddle/pull/64460) -- 修复 sharding 策略在支持 reduce_avg 操作中、comm_overlap 在关闭时未对梯度进行缩放的问题。[#62702](https://github.com/PaddlePaddle/Paddle/pull/62702) -- 解决 Stage2 中 main grad 计算顺序、fusion 相关的 bug。[#59142](https://github.com/PaddlePaddle/Paddle/pull/59142) -- 修复 sharding 策略下,当开启 reduce_avg 通信操作时,无法找到该开关属性的问题。[#62502](https://github.com/PaddlePaddle/Paddle/pull/62502) -- sharding stage1 训练支持非训练参数训练,解决部分参数设置 stop_gradient=True 的问题。[#62616](https://github.com/PaddlePaddle/Paddle/pull/62616) -- 修正 TCP 关闭时打印的信息,防止误导用户。[#62631](https://github.com/PaddlePaddle/Paddle/pull/62631) -- 针对数据并行训练中,部分梯度没有初始化,出现 segmentation fault 错误,修改 DataParallel 训练问题,解决多卡训练出错的问题。[#62299](https://github.com/PaddlePaddle/Paddle/pull/62299) -- 针对开启序列并行的场景,修复了部分模型因为权重冻结而导致的 bug。[#63596](https://github.com/PaddlePaddle/Paddle/pull/63596) -- 针对单路 dp 的 autotuner 场景,修复了一些 bug。[#60757](https://github.com/PaddlePaddle/Paddle/pull/60757) -- 修复流水并行策略 aadiff bug。 ([#64716](https://github.com/PaddlePaddle/Paddle/pull/64716)) -- 移除部分分布式单测。 ([#62762](https://github.com/PaddlePaddle/Paddle/pull/62762)) - -### 安全风险修复 -- 针对 prune\_by\_memory\_estimation 算子中存在安全泄露风险,修补安全漏洞。[#61320](https://github.com/PaddlePaddle/Paddle/pull/61320) - -## 7.参数服务器 -本次更新主要修复了参数服务器使用过程的若干 bug 以及编译安装等问题。 +### 新特性 + +- 增强对 fp8 类型的支持。 [#64735](https://github.com/PaddlePaddle/Paddle/pull/64735), [#64955](https://github.com/PaddlePaddle/Paddle/pull/64955) +- 增强对 xpu 的支持。 [#65362](https://github.com/PaddlePaddle/Paddle/pull/65362), [#65304](https://github.com/PaddlePaddle/Paddle/pull/65304), [#68451](https://github.com/PaddlePaddle/Paddle/pull/68451) +- 增强对 DCU 的支持。 [#65398](https://github.com/PaddlePaddle/Paddle/pull/65398), [#65857](https://github.com/PaddlePaddle/Paddle/pull/65857), [#66423](https://github.com/PaddlePaddle/Paddle/pull/66423) +- 扩展 oneDNN 能力。 [#66000](https://github.com/PaddlePaddle/Paddle/pull/66000), [#66474](https://github.com/PaddlePaddle/Paddle/pull/66474), [#66568](https://github.com/PaddlePaddle/Paddle/pull/66568) +- 重命名参数并支持更复杂的 mask。 [#65409](https://github.com/PaddlePaddle/Paddle/pull/65409) +- 支持 flash-attention。 [#68968](https://github.com/PaddlePaddle/Paddle/pull/68968) +- 支持 OpenVINO CPU 高性能推理。 [#69122](https://github.com/PaddlePaddle/Paddle/pull/69122) + +### 功能改进 + +- 增强 PIR pass 以实现更好融合。 [#65540](https://github.com/PaddlePaddle/Paddle/pull/65540) +- 增强 OneDNN 功能。 [#65971](https://github.com/PaddlePaddle/Paddle/pull/65971), [#70430](https://github.com/PaddlePaddle/Paddle/pull/70430), [#70630](https://github.com/PaddlePaddle/Paddle/pull/70630), [#70871](https://github.com/PaddlePaddle/Paddle/pull/70871) +- 提升 FlashMask 性能。 [#68109](https://github.com/PaddlePaddle/Paddle/pull/68109) +- 优化 kernel 表现。 [#69660](https://github.com/PaddlePaddle/Paddle/pull/69660), [#69596](https://github.com/PaddlePaddle/Paddle/pull/69596) +- 组合算子优化。 [#69515](https://github.com/PaddlePaddle/Paddle/pull/69515), [#69616](https://github.com/PaddlePaddle/Paddle/pull/69616) ### Bug 修复 -- 针对 unique 算子读写越界的问题,修复了 unique 算子计算过程长度设置错误问题,保证 unique 算子运算正确性。[#60840](https://github.com/PaddlePaddle/Paddle/pull/60840) -- 针对 PGLBox 训练过程 save/load 功能缺失以及编译错误等问题,修复了 PGLBox save/load 和编译过程的若干 bug,保证了 PGLBox 功能的正确性。[#63905](https://github.com/PaddlePaddle/Paddle/pull/63905) -- 针对 CPUPS 训练过程触发 GPUPS 逻辑导致训练挂掉的问题,修复了 CPUPS 中 use_ps_gpu 的设置值,保证 CPUPS 训练流程的正确性。[#61406](https://github.com/PaddlePaddle/Paddle/pull/61406) -- 针对 GPUPS 在 CUDA 12.3 中训练出 cudaErrorInvalidResourceHandle 错误的问题,加入了 device id 切换机制,保证在正确的设备上进行对应的资源操作。[#63391](https://github.com/PaddlePaddle/Paddle/pull/63391) -- 针对 PGLBox Embedding Dump 过程出现乱码的问题,修复了 C++ std::string 使用不当的 bug,保证 Embedding Dump 结果的正确性。[#65179](https://github.com/PaddlePaddle/Paddle/pull/65179) -### 文档完善 -- 在 RPC 接口文档中接入安全警告,提醒用户需要在安全的网络条件下使用此接口。[#64100](https://github.com/PaddlePaddle/Paddle/pull/64100) +- 修复 PIR、CINN、SOT、OneDNN 等相关的 Bug。 [#68951](https://github.com/PaddlePaddle/Paddle/pull/68951), [#69553](https://github.com/PaddlePaddle/Paddle/pull/69553), [#69682](https://github.com/PaddlePaddle/Paddle/pull/69682), [#67741](https://github.com/PaddlePaddle/Paddle/pull/67741), [#69346](https://github.com/PaddlePaddle/Paddle/pull/69346), [#69401](https://github.com/PaddlePaddle/Paddle/pull/69401), [#68903](https://github.com/PaddlePaddle/Paddle/pull/68903) +- 修复组合算子相关 Bug。 [#69479](https://github.com/PaddlePaddle/Paddle/pull/69479), [#69487](https://github.com/PaddlePaddle/Paddle/pull/69487), [#67176](https://github.com/PaddlePaddle/Paddle/pull/67176) +- 修复 CPU 上的 FP8 数据类型问题。 [#65539](https://github.com/PaddlePaddle/Paddle/pull/65539) +- 去除计算流下不必要的创建 event 的开销 。 [#67315](https://github.com/PaddlePaddle/Paddle/pull/67247) +- 修复性能问题。 [#68378](https://github.com/PaddlePaddle/Paddle/pull/68378) +- 修复类型相关问题。 [#69720](https://github.com/PaddlePaddle/Paddle/pull/69720) +- 修复其他问题。 [#70019](https://github.com/PaddlePaddle/Paddle/pull/70019), [#70008](https://github.com/PaddlePaddle/Paddle/pull/70008), [#70645](https://github.com/PaddlePaddle/Paddle/pull/70645), [#71209](https://github.com/PaddlePaddle/Paddle/pull/71209), [#68152](https://github.com/PaddlePaddle/Paddle/pull/68152), [#69907](https://github.com/PaddlePaddle/Paddle/pull/69907), [#71207](https://github.com/PaddlePaddle/Paddle/pull/71207) + +### 性能优化 + +- CINN 编译器相关优化。 [#69455](https://github.com/PaddlePaddle/Paddle/pull/69455), [#70284](https://github.com/PaddlePaddle/Paddle/pull/70284), [#67576](https://github.com/PaddlePaddle/Paddle/pull/67576), [#68946](https://github.com/PaddlePaddle/Paddle/pull/68946), [#68615](https://github.com/PaddlePaddle/Paddle/pull/68615) +- oneDNN 相关优化。 [#68784](https://github.com/PaddlePaddle/Paddle/pull/68784), [#68716](https://github.com/PaddlePaddle/Paddle/pull/68716), [#67554](https://github.com/PaddlePaddle/Paddle/pull/67554) +- 内存相关优化。 [#68660](https://github.com/PaddlePaddle/Paddle/pull/68571), [#69930](https://github.com/PaddlePaddle/Paddle/pull/69930), [#68174](https://github.com/PaddlePaddle/Paddle/pull/68174), [#68660](https://github.com/PaddlePaddle/Paddle/pull/68571), [#70359](https://github.com/PaddlePaddle/Paddle/pull/70359) +- kernel 计算相关优化。 [#65507](https://github.com/PaddlePaddle/Paddle/pull/65507), [#68541](https://github.com/PaddlePaddle/Paddle/pull/68541), [#71479](https://github.com/PaddlePaddle/Paddle/pull/71479), [#71403](https://github.com/PaddlePaddle/Paddle/pull/71403) +- XPU 相关优化。 [#67051](https://github.com/PaddlePaddle/Paddle/pull/67051) +- 其他优化例如推理过程的 pass 优化、动态 shape 在自动并行的优化及 FlashAttention 计算优化等。 [#68394](https://github.com/PaddlePaddle/Paddle/pull/68394), [#68696](https://github.com/PaddlePaddle/Paddle/pull/68696), [#68759](https://github.com/PaddlePaddle/Paddle/pull/68759), [#68791](https://github.com/PaddlePaddle/Paddle/pull/68791), [#69390](https://github.com/PaddlePaddle/Paddle/pull/69390), [#69961](https://github.com/PaddlePaddle/Paddle/pull/69961), [#69939](https://github.com/PaddlePaddle/Paddle/pull/69939), [#70455](https://github.com/PaddlePaddle/Paddle/pull/70455), [#70663](https://github.com/PaddlePaddle/Paddle/pull/70663), [#71290](https://github.com/PaddlePaddle/Paddle/pull/71123) + +### 其他 + +- 修改函数命名空间。 [#66818](https://github.com/PaddlePaddle/Paddle/pull/66818), [#67023](https://github.com/PaddlePaddle/Paddle/pull/67023), [#67114](https://github.com/PaddlePaddle/Paddle/pull/67114), [#67217](https://github.com/PaddlePaddle/Paddle/pull/67217), [#67524](https://github.com/PaddlePaddle/Paddle/pull/67524), [#67796](https://github.com/PaddlePaddle/Paddle/pull/67796), [#67881](https://github.com/PaddlePaddle/Paddle/pull/67881) +- 升级 OneDNN。 [#69917](https://github.com/PaddlePaddle/Paddle/pull/69917) +- 修改 pass 等级。 [#69524](https://github.com/PaddlePaddle/Paddle/pull/69524) +- 内存读写相关优化。 [#65804](https://github.com/PaddlePaddle/Paddle/pull/65804), [#66923](https://github.com/PaddlePaddle/Paddle/pull/66923) +- 优化 GetValueName 相关签名。 [#66363](https://github.com/PaddlePaddle/Paddle/pull/66363), [#66559](https://github.com/PaddlePaddle/Paddle/pull/66559), [#66738](https://github.com/PaddlePaddle/Paddle/pull/66738) + +### 废弃 + +- 删除废弃文件、功能。 [#67514](https://github.com/PaddlePaddle/Paddle/pull/67514), [#67811](https://github.com/PaddlePaddle/Paddle/pull/67811), [#67911](https://github.com/PaddlePaddle/Paddle/pull/67911) + +## 7. 推理部署 + +重点围绕**新一代中间表示(PIR)生态建设**与**大模型推理优化**两大核心方向, 主要突破包括: -### 安全加强 -- 修复若干代码安全问题,防止恶意代码注入。[#60023](https://github.com/PaddlePaddle/Paddle/pull/60023),[#60544](https://github.com/PaddlePaddle/Paddle/pull/60544),[#60615](https://github.com/PaddlePaddle/Paddle/pull/60615) +1. **PIR-TensorRT 深度融合** -## 8.推理部署 -推理框架基于 PIR 升级了 GPU、XPU、CPU 硬件下 PASS,相比上个版本可大幅减少代码行数,提升开发效率。底层执行器升级到了新版异步执行器,在大多数模型上提升推理性能。完成基于 CINN 编译器进行推理加速的适配对接。针对这些特性增加了开关,用户可设置开启。此外,Paddle Inference 还支持了原生与 TensorRT 子图混合推理下直接加载优化后的序列化模型,可以减少启动时耗时。针对 Paddle-TensorRT 增加灵活控制节点计算精度、子图是否进入 TensorRT 计算等接口,方便调试。 性能优化上,GPU、XPU、CPU 都增加了较多 Transformer 及 LLM 计算加速的融合算子,如分组注意力机制融合算子、GQA 结构、WINT4 等支持,并支持通过 PASS 自动匹配。 + - 完成核心执行机制重构与代码优化,开发 50+算子转换器 + - 新增低精度支持(FP16/INT8)与 Generic Plugin 执行能力 + - 构建完整单测体系,支持模型加载/保存全流程 + +2. **大模型推理性能飞跃** + + - 新增混合专家系统(MoE)全流程支持,覆盖 Hopper 架构优化 + - 支持 128K 超长序列处理,提升长文本推理能力 + - 实现 FP8/W8A8 等前沿量化方案,降低显存占用 + +3. **基础架构全面升级** + + - OneDNN 升级至 3.6 版本,CPU 推理性能显著提升 + - 模型加载速度优化 40%+,支持 PIR 模型快速加载 + - 完善分布式推理支持,修复 allreduce 数据类型问题 ### 新增功能 -- Paddle-TensorRT - - Paddle-TensorRT 底层调用的 API 升级,在 TensorRT 版本大于 8.5 以上时,调用的 EnqueueV2 API (后续会被废弃)升级为 EnqueueV3 API。[#60807](https://github.com/PaddlePaddle/Paddle/pull/60807) - - 增加配置 config.exp_disable_tensorrt_subgraph()可以设置一些子图不进入 TensorRT。[#61967](https://github.com/PaddlePaddle/Paddle/pull/61967) - - 增加配置 config.exp_disable_tensorrt_dynamic_shape_ops()可设置动态 shape 输入的算子不进入 TensorRT,默认值为 False。[#62352](https://github.com/PaddlePaddle/Paddle/pull/62352) - - 增加配置 config.exp_specify_tensorrt_subgraph_precision()可以设置节点跑不同的精度类型。[#62402](https://github.com/PaddlePaddle/Paddle/pull/62402) -- Inference 中增加开启 CINN 编译器的开关,配置推理 config 时,通过 config.enable_cinn()开启 CINN。[#61949](https://github.com/PaddlePaddle/Paddle/pull/61949) -- Inference 升级使用 PIR 机制 - - config 增加 enable_new_ir()接口使能 PIR。[#61968](https://github.com/PaddlePaddle/Paddle/pull/61968) - - config 增加 set_optimization_level()接口可设置不同优化等级。[#61968](https://github.com/PaddlePaddle/Paddle/pull/61968) - - PIR 机制下 PASS 功能支持自定义 C++PASS。[#62468](https://github.com/PaddlePaddle/Paddle/pull/62468) - - 推理库对外暴露 PIR 相关实现头文件,支持用户基于 PIR 的二次开发,如自定义 Pass 开发等。[#61863](https://github.com/PaddlePaddle/Paddle/pull/61863),[#62293](https://github.com/PaddlePaddle/Paddle/pull/62293) - - PIR 机制下支持通过对 Predictor 注册 Hook 操作算子的输入输出。[#63101](https://github.com/PaddlePaddle/Paddle/pull/63101) -- 多层 Transformer 融合算子 fused_multi_transformer_op 融合算子支持 GQA 计算。[#64125](https://github.com/PaddlePaddle/Paddle/pull/64125) + +- 支持基于飞桨新一代中间表示(PIR)的 Paddle-TensorRT + - 核心基础执行机制功能开发及代码优化。[#64995](https://github.com/PaddlePaddle/Paddle/pull/64995),[#67054](https://github.com/PaddlePaddle/Paddle/pull/67054),[#67660](https://github.com/PaddlePaddle/Paddle/pull/67660),[#67755](https://github.com/PaddlePaddle/Paddle/pull/67755),[#70762](https://github.com/PaddlePaddle/Paddle/pull/70762), + - 算子 Marker 及 Converter 开发。[#67753](https://github.com/PaddlePaddle/Paddle/pull/67753),[#67956](https://github.com/PaddlePaddle/Paddle/pull/67956),[#68084](https://github.com/PaddlePaddle/Paddle/pull/68084),[#67974](https://github.com/PaddlePaddle/Paddle/pull/67974),[#68395](https://github.com/PaddlePaddle/Paddle/pull/68395),[#68216](https://github.com/PaddlePaddle/Paddle/pull/68216),[#68529](https://github.com/PaddlePaddle/Paddle/pull/68529),[#68608](https://github.com/PaddlePaddle/Paddle/pull/68608), [#68663](https://github.com/PaddlePaddle/Paddle/pull/68663),[#68757](https://github.com/PaddlePaddle/Paddle/pull/68757),[#68614](https://github.com/PaddlePaddle/Paddle/pull/68614),[#68783](https://github.com/PaddlePaddle/Paddle/pull/68783),[#68775](https://github.com/PaddlePaddle/Paddle/pull/68775),[#68839](https://github.com/PaddlePaddle/Paddle/pull/68839),[#68686](https://github.com/PaddlePaddle/Paddle/pull/68686),[#68840](https://github.com/PaddlePaddle/Paddle/pull/68840),[#68941](https://github.com/PaddlePaddle/Paddle/pull/68941),[#69015](https://github.com/PaddlePaddle/Paddle/pull/69015),[#69038](https://github.com/PaddlePaddle/Paddle/pull/69038),[#69117](https://github.com/PaddlePaddle/Paddle/pull/69117),[#69208](https://github.com/PaddlePaddle/Paddle/pull/69208),[#69315](https://github.com/PaddlePaddle/Paddle/pull/69315),[#69261](https://github.com/PaddlePaddle/Paddle/pull/69261),[#68878](https://github.com/PaddlePaddle/Paddle/pull/68878),[#69705](https://github.com/PaddlePaddle/Paddle/pull/69705),[#69706](https://github.com/PaddlePaddle/Paddle/pull/69706),[#70170](https://github.com/PaddlePaddle/Paddle/pull/70170),[#70267](https://github.com/PaddlePaddle/Paddle/pull/70267),[#70429](https://github.com/PaddlePaddle/Paddle/pull/70429),[#69330](https://github.com/PaddlePaddle/Paddle/pull/69330),[#70507](https://github.com/PaddlePaddle/Paddle/pull/70507),[#70535](https://github.com/PaddlePaddle/Paddle/pull/70535),[#70667](https://github.com/PaddlePaddle/Paddle/pull/70667),[#70816](https://github.com/PaddlePaddle/Paddle/pull/70816),[#70826](https://github.com/PaddlePaddle/Paddle/pull/70826),[#70955](https://github.com/PaddlePaddle/Paddle/pull/70955),[#71028](https://github.com/PaddlePaddle/Paddle/pull/71028),[#71013](https://github.com/PaddlePaddle/Paddle/pull/71013),[#71157](https://github.com/PaddlePaddle/Paddle/pull/71157),[#71231](https://github.com/PaddlePaddle/Paddle/pull/71231),[#69199](https://github.com/PaddlePaddle/Paddle/pull/69199),[#68956](https://github.com/PaddlePaddle/Paddle/pull/68956),[#66658](https://github.com/PaddlePaddle/Paddle/pull/66658),[#66811](https://github.com/PaddlePaddle/Paddle/pull/66811),[#67519](https://github.com/PaddlePaddle/Paddle/pull/67519),[#67877](https://github.com/PaddlePaddle/Paddle/pull/67877),[#68090](https://github.com/PaddlePaddle/Paddle/pull/68090),[#69086](https://github.com/PaddlePaddle/Paddle/pull/69086),[#68787](https://github.com/PaddlePaddle/Paddle/pull/68787),[#68778](https://github.com/PaddlePaddle/Paddle/pull/68778),[#69318](https://github.com/PaddlePaddle/Paddle/pull/69318),[#69995](https://github.com/PaddlePaddle/Paddle/pull/69995),[#70325](https://github.com/PaddlePaddle/Paddle/pull/70325),[#70817](https://github.com/PaddlePaddle/Paddle/pull/70817),[#70879](https://github.com/PaddlePaddle/Paddle/pull/70879),[#70875](https://github.com/PaddlePaddle/Paddle/pull/70875),[#71041](https://github.com/PaddlePaddle/Paddle/pull/71041),[#68876](https://github.com/PaddlePaddle/Paddle/pull/68876) + - Generic Plugin 执行功能支持。[#66634](https://github.com/PaddlePaddle/Paddle/pull/66634),[#70251](https://github.com/PaddlePaddle/Paddle/pull/70251) + - 低精度(FP16,INT8)功能支持。[#69597](https://github.com/PaddlePaddle/Paddle/pull/69597),[#71127](https://github.com/PaddlePaddle/Paddle/pull/71127), + - 单测体系、pass 使用支持等辅助功能完善[#67525](https://github.com/PaddlePaddle/Paddle/pull/67525),[#68034](https://github.com/PaddlePaddle/Paddle/pull/68034),[#71281](https://github.com/PaddlePaddle/Paddle/pull/71281),[#71235](https://github.com/PaddlePaddle/Paddle/pull/71235),[#67568](https://github.com/PaddlePaddle/Paddle/pull/67568),[#70139](https://github.com/PaddlePaddle/Paddle/pull/70139),[#70529](https://github.com/PaddlePaddle/Paddle/pull/70529) +- 大模型推理优化 + - 新增 fused_moe 功能支持(基础支持/非规范 TopK/Hopper 架构)[#66084](https://github.com/PaddlePaddle/Paddle/pull/66084), [#67425](https://github.com/PaddlePaddle/Paddle/pull/67425), [#67732](https://github.com/PaddlePaddle/Paddle/pull/67732) + - 支持混合精度计算(GQA 混合精度/BF16 注册)[#65078](https://github.com/PaddlePaddle/Paddle/pull/65078), [#67769](https://github.com/PaddlePaddle/Paddle/pull/67769) + - 新增推理优化功能(动态图推理/128K 长序列支持)[#65962](https://github.com/PaddlePaddle/Paddle/pull/65962), [#70088](https://github.com/PaddlePaddle/Paddle/pull/70088) + - 新增量化推理算子实现(FP8 W8A8 计算/weight only int4 量化)[#65441](https://github.com/PaddlePaddle/Paddle/pull/65441), [#64094](https://github.com/PaddlePaddle/Paddle/pull/64094) ### 功能完善 -- 推理支持直接加载优化后的模型,使得可以完全跳过 IR 优化,使用该方式部署可以最大程度降低框架开销。[#61598](https://github.com/PaddlePaddle/Paddle/pull/61598) -- 支持加载保存下来的经过 IR PASS 优化后的模型推理时,重新指定 shape 范围信息文件。[#60457](https://github.com/PaddlePaddle/Paddle/pull/60457) -- 控制流算子的子图内可收集 Shape 信息,支持使用 Paddle-TensorRT 推理加速。[#60451](https://github.com/PaddlePaddle/Paddle/pull/60451) ,[#59588](https://github.com/PaddlePaddle/Paddle/pull/59588) -- GPU 原生推理的混合精度 PASS(auto_mixed_precision_pass)支持处理稀疏 Tensor。[#62656](https://github.com/PaddlePaddle/Paddle/pull/62656) -- XPU 硬件相关 - - XPU 针对 Conv 和 FC 的融合 PASS 支持 Float 到 INT31 类型的转换。[#59981](https://github.com/PaddlePaddle/Paddle/pull/59981) - - XPU 的 strided slice 算子支持设置 strides 未负数。 [#62268](https://github.com/PaddlePaddle/Paddle/pull/62268) - - XPU 的多层 Encoder 融合 PASS 可以自适应序列长度并支持变长 [#63825](https://github.com/PaddlePaddle/Paddle/pull/63825) -- Paddle TensorRT INT8 计算模式下支持 tile 算子进入 TensorRT 计算,提升部分模型 INT8 性能。 [#60189](https://github.com/PaddlePaddle/Paddle/pull/60189) - -### 模型压缩 -主要针对训练后量化(Post Training Quantization,PTQ)和量化训练(Quantization Aware Trainig,QAT)做了 bug 修复和功能优化。 -- 支持模按照通道内分组的模拟量化[#61828](https://github.com/PaddlePaddle/Paddle/pull/61828) -- 支持动态图下自动保存量化 scale 到模型参数文件中[#59441](https://github.com/PaddlePaddle/Paddle/pull/59441) -- 去除中 dataloader 必须是 DataLoader 实例的限制[#61798](https://github.com/PaddlePaddle/Paddle/pull/61798) + +- Inference 在 PIR 下功能机制完善 + - 执行器支持加载.json 模型[#65223](https://github.com/PaddlePaddle/Paddle/pull/65223) + - 支持可控制开启 PIR 模式开关[#65596](https://github.com/PaddlePaddle/Paddle/pull/65596) +- 大模型推理机制完善 + - 优化 gemm 算法搜索(cublaslt 全局搜索/离线缓存)[#65597](https://github.com/PaddlePaddle/Paddle/pull/65597), [#66132](https://github.com/PaddlePaddle/Paddle/pull/66132) + - 增强类型系统兼容性(PD_VISIT_FLOATING_AND_HALF_TYPES)[#71022](https://github.com/PaddlePaddle/Paddle/pull/71022) + - 优化注意力机制(多块 MMHA/XPU 支持)[#67211](https://github.com/PaddlePaddle/Paddle/pull/67211), [#68104](https://github.com/PaddlePaddle/Paddle/pull/68104) ### 性能优化 -- 推理执行器升级,保正性能不变情况下,大幅度降低运行时显存占用,可通过 config.enable_use_executor(True)来使用。[#57920](https://github.com/PaddlePaddle/Paddle/pull/57920),[#58452](https://github.com/PaddlePaddle/Paddle/pull/58452),[#63350](https://github.com/PaddlePaddle/Paddle/pull/63350),[#64466](https://github.com/PaddlePaddle/Paddle/pull/64466) -- 升级 paddle inference 的 oneDNN 版本到 v3.4,其中整体性能相比 v3.3 版本有提升。 [#64661](https://github.com/PaddlePaddle/Paddle/pull/64661) -- 升级基于 CUTLASS 支持矩阵乘与激活的融合计算。 ([#61925](https://github.com/PaddlePaddle/Paddle/pull/61925)) - -#### PIR 机制下新增通用 PASS -- 添加 identity_op_clean_pass 和 matmul_scale_fuse_pass。 [#59840](https://github.com/PaddlePaddle/Paddle/pull/59840) -- 添加 fused_flash_attn_pass,该 pass 会调用 flash_attention 替换原始的 attention 计算。[#64213](https://github.com/PaddlePaddle/Paddle/pull/64213),[#64707](https://github.com/PaddlePaddle/Paddle/pull/64707),[#63304](https://github.com/PaddlePaddle/Paddle/pull/63304) -- 推理 PIR 新架构下全新升级 layout 布局调整算法,支持 conv 类、norm 类等算子的 NHWC 推理,在 SD 模型上测试大幅提升性能。[#63628](https://github.com/PaddlePaddle/Paddle/pull/63628),[#64634](https://github.com/PaddlePaddle/Paddle/pull/64634),[#64658](https://github.com/PaddlePaddle/Paddle/pull/64658),[#64708](https://github.com/PaddlePaddle/Paddle/pull/64708),[#64830](https://github.com/PaddlePaddle/Paddle/pull/64830),[#64896](https://github.com/PaddlePaddle/Paddle/pull/64896) -- 增加 remove_redundant_transpose PASS。 [#63357](https://github.com/PaddlePaddle/Paddle/pull/63357) -- 在推理中使能 CSE PASS,提升推理性能。[#64523](https://github.com/PaddlePaddle/Paddle/pull/64523) - -#### GPU 性能优化 -含新增融合算子及 PIR 机制下新增 PASS。 -- 稀疏卷积算子(sparse conv)性能优化,提升 BEV 等模型的推理性能。[#63067](https://github.com/PaddlePaddle/Paddle/pull/63067) -- 新增基于 flash attention 的融合 PASS。 [#63220](https://github.com/PaddlePaddle/Paddle/pull/63220) -- 理支持 elementwise_add+group_norm+silu 激活的算子融合 pattern 及其对应融合 kernel。[#64199](https://github.com/PaddlePaddle/Paddle/pull/64199) -- 矩阵乘计算支持 groupwise 的 Weight only INT4 计算。[#60422](https://github.com/PaddlePaddle/Paddle/pull/60422) 、[#63212](https://github.com/PaddlePaddle/Paddle/pull/63212) 、[#60204](https://github.com/PaddlePaddle/Paddle/pull/60204)) -- 分组注意力机制融合算子 block_multi_head_attention 的算子实现支持 KV Cache 量化。[#59951](https://github.com/PaddlePaddle/Paddle/pull/59951)) -- 推理使用 CUTLASS 升级 conv 融合算子实现并支持 PASS 自动融合支持 bias 与 activation,新算子相较原先 cuDNN 实现有显著的性能加速。需通过 config.exp_enable_use_cutlass(True)使用。[#64201](https://github.com/PaddlePaddle/Paddle/pull/64201)、[#64641](https://github.com/PaddlePaddle/Paddle/pull/64641) -- 添加 blha_get_max_len 算子并去除了 block_multihead_attention 中每次调用 get_max_len 的行为,该功能应用于大模型动态推理加速。[#64246](https://github.com/PaddlePaddle/Paddle/pull/64246) -- 数据排布优化 PASS 禁止 conv 融合算子 FP32 精度类型时使用 NHWC 模式计算,原因是 cuDNN 在此条件下会导致性能退化。[#63400](https://github.com/PaddlePaddle/Paddle/pull/63400) -- GPU 峰值显存优化,升级底层接口 TryShrinkMemory 升级支持 GPU place 下支持释放显存池空闲显存,某些场景下可大幅度削减峰值显存。[#61319](https://github.com/PaddlePaddle/Paddle/pull/61319) - -#### CPU 性能优化 -含新增融合算子及 PIR 机制下新增 PASS 并优化部分 Kernel。 -- 添加 scale_matmul_fuse_pass [#63313](https://github.com/PaddlePaddle/Paddle/pull/63313) -- 融合算子 fused_bias_residual_layernorm 和 fused_rms_norm 添加 CPU 实现,大幅度推理速度。[#63196](https://github.com/PaddlePaddle/Paddle/pull/63196)、[#63165](https://github.com/PaddlePaddle/Paddle/pull/63165) -- 新增 Deconvolution kernel 的缓存优化,从而大大提升该算子的执行速度。 [#60922](https://github.com/PaddlePaddle/Paddle/pull/60922) -- PIR 下新增 depthwise_conv 融合 PASS,将 depthwise_conv 算子转换为 conv2d,从而使用 onednn conv2d 的 kernel 优化,提升该算子推理速度。 [#63051](https://github.com/PaddlePaddle/Paddle/pull/63051) -- PIR 下新增 Conv 与激活的融合 PASS(conv_activation_mkldnn_fuse_pass),支持 conv 和 13 种激活函数进行融合,大大提升 conv 相关算子的推理速度。 [#63145](https://github.com/PaddlePaddle/Paddle/pull/63145) -- PIR 下新增多种算子和 unsqueeze 的算子融合 PASS(operator_unsqueeze_onednn_fuse_pass),提升推理速度。 [#63592](https://github.com/PaddlePaddle/Paddle/pull/63592) -- PIR 下新增将 reshape 融合进多个算子的 PASS (operator_reshape_onednn_fuse_pass)。 [#63812](https://github.com/PaddlePaddle/Paddle/pull/63812) -- PIR 下新增 scale 融合 PASS (operator_scale_onednn_fuse_pass)。 [#63811](https://github.com/PaddlePaddle/Paddle/pull/63811) -- PIR 下新增 conv 与 bias 融合的 PASS (conv2d_transpose_bias 算子) 。 [#62241](https://github.com/PaddlePaddle/Paddle/pull/62241) -- PIR 下新增 onednn_placement_pass,支持了 151 种算子从 Phi 算子转换为 oneDNN 算子,从而使用 oneDNN 高性能库进行优化,提升推理速度。 [#63982](https://github.com/PaddlePaddle/Paddle/pull/63982) -- PIR 下新增 elementwise 类型算子和 13 种激活函数的融合,大大提升 cpu 下开启 onednn 的推理速度。 [#63516](https://github.com/PaddlePaddle/Paddle/pull/63516) -- PIR 下新增多个 conv + concat + 激活函数和 fused_conv + concat + 激活函数的融合,大大提升了 conv 下有 concat 和激活函数的情况下推理速度。 [#62993](https://github.com/PaddlePaddle/Paddle/pull/62993)、 [#62713](https://github.com/PaddlePaddle/Paddle/pull/62713) -- PIR 下新增 matmul+add 算子融合 PASS (matmul_elementwise_add_fuse_pass)。[#62715](https://github.com/PaddlePaddle/Paddle/pull/62715) -- PIR 下新增 scale 参数折叠 PASS(scale_matmul_fuse_pass)。[#63313](https://github.com/PaddlePaddle/Paddle/pull/63313) -- PIR 下新增 softplus 与 12 种激活函数融合 PASS(softplus_activation_fuse_pass)。[#63617](https://github.com/PaddlePaddle/Paddle/pull/63617) -- PIR 下新增 fc 算子转换 PASS(fc_onednn_enable_pass)。[#63518](https://github.com/PaddlePaddle/Paddle/pull/63518) -- PIR 下新增自注意力算子融合 PASS(self_attention_fuse_pass)。[#63726](https://github.com/PaddlePaddle/Paddle/pull/63726) -- PIR 下新增 fc 与 12 种激活函数融合 PASS(fc_activation_fuse_pass)。[#63853](https://github.com/PaddlePaddle/Paddle/pull/63853) -- PIR 下新增 BatchNorm 折叠 PASS(conv2d_bn_onednn_fuse_pass),扩增后续 pass 的融合几率。[#64524](https://github.com/PaddlePaddle/Paddle/pull/64524) -- PIR 下新增 matmul 与 12 种激活函数融合 PASS(matmul_activation_fuse_pass)。[#62901](https://github.com/PaddlePaddle/Paddle/pull/62901) -- PIR 下新增 reshape + transpose + reshape 融合 PASS(shuffle_channel_detect_pass),在特定条件下融合为 shuffle_channel 算子。[#64053](https://github.com/PaddlePaddle/Paddle/pull/64053) -- PIR 下新增 reshape + transpose + matmul 融合 PASS(reshape_transpose_matmul_fuse_pass)。[#62998](https://github.com/PaddlePaddle/Paddle/pull/62998) -- PIR 下新增 matmul + transpose + reshape 融合 PASS(matmul_transpose_reshape_fuse_pass),在部分场景下显著提升性能。[#63151](https://github.com/PaddlePaddle/Paddle/pull/63151)(https://github.com/PaddlePaddle/Paddle/pull/63151) + +- OneDNN 升级到 3.6 版本(在 GNR/EMR 设备上模型推理性能获得普遍提升)[#69386](https://github.com/PaddlePaddle/Paddle/pull/69386) +- 算子性能优化(layer_norm/top_p_sampling)[#65711](https://github.com/PaddlePaddle/Paddle/pull/65711) +- 模型加载加速(常规/PIR 模型)[#69110](https://github.com/PaddlePaddle/Paddle/pull/69110), [#70219](https://github.com/PaddlePaddle/Paddle/pull/70219) ### Bug 修复 -- 修复 faster_rcnn_swin_tiny_fpn_1x_coco 等模型中的混合精度转换问题,解决了 mixed_precision_pass 的错误。 [#64673](https://github.com/PaddlePaddle/Paddle/pull/64673) -- 阻止 fused_conv2d_add_act pass 在激活函数为 sigmoid 中被生效(cudnn 版本 8.0~8.7 之间时,融合 conv2d 和 sigmoid 会导致性能退化)。[#64717](https://github.com/PaddlePaddle/Paddle/pull/64717) -- 修复 self_dp_attention 和 fused_layer_norm_avx_kernel 在 Clang12 中的编译问题。 [#63414](https://github.com/PaddlePaddle/Paddle/pull/63414) -- 修复部分模型在 IR/Pass 阶段 qdq 算子中的 scale 和 zeroPoint 过早删除的问题。 [#62225](https://github.com/PaddlePaddle/Paddle/pull/62225) -- 修复同时开启 Config.UseOptimizedModel()和 config.EnableMemoryOptim()时导致报错的问题。 [#62501](https://github.com/PaddlePaddle/Paddle/pull/62501) -- 增加 matmul_scale_fuse_pass 的约束,其中输入 w 必须是权重,否则不会匹配该 pass。 [#62850](https://github.com/PaddlePaddle/Paddle/pull/62850) -- 保持 inference 模型输出键顺序保证与动态图模型导出时的顺序一致。 [#63791](https://github.com/PaddlePaddle/Paddle/pull/63791) -- 修复子图在常量折节 PASS 在"被折叠的 op 和其输入输出不在一个子图时"出错问题。 [#62148](https://github.com/PaddlePaddle/Paddle/pull/62148) -- 修复 PaddleTRT 模式下若干运行时问题。包括 int8 模式下 yolo_box 算子引起的量化校准表生成失败、reduce 算子 dim 属性数据类型未正确处理引起的报错。[#61596](https://github.com/PaddlePaddle/Paddle/pull/61596) -- 修复混合精度推理模式下若干运行时报错问题。包括 fused conv2d 算子间共享权重未正确转换权重 layout、fused conv2d 算子 backend 未正常选择为 cuDNN、fused conv2d 算子在 NHWC 下错误处理 bias 维度、错误处理 norm 类算子的输入数据类型引起的报错。[#60955](https://github.com/PaddlePaddle/Paddle/pull/60955)、[#60076](https://github.com/PaddlePaddle/Paddle/pull/60076)、[#63007](https://github.com/PaddlePaddle/Paddle/pull/63007)、[#63988](https://github.com/PaddlePaddle/Paddle/pull/63988) -- 修复 config.delete_pass 功能未生效问题。[#61056](https://github.com/PaddlePaddle/Paddle/pull/61056) -- PIR 中修复 While 控制流的 GC 机制,提前回收不需要的输入,减少峰值显存,例如在 LLaMA 7B 模型中减少 2GB 显存。[#63062](https://github.com/PaddlePaddle/Paddle/pull/63062) -- 修正了 OneDNN mean kernel 回退错误。 [#64676](https://github.com/PaddlePaddle/Paddle/pull/64676) -- 修正 conv_bias_fuse_pass 新增了若干强约束, 例如 bias 的 shape 不能为 1,从而保证 pass 推理结果稳定。 [#64412](https://github.com/PaddlePaddle/Paddle/pull/64412) -- 修正 conv_elementwise_add_onednn_fuse_pass 新增了若干强约束,例如 conv2d_out 和 residual_param 的尺寸必须一致,从而保证 pass 推理稳定。 [#64448](https://github.com/PaddlePaddle/Paddle/pull/64448) -- 修复在特定情况下,反复插入量化反量化算子的问题 [#63082](https://github.com/PaddlePaddle/Paddle/pull/63082) - -## 9.硬件适配 - -### 适配方案 (Custom Device) -飞桨硬件接入本次新增了对 4 款硬件昆仑 XPU、昇腾 NPU、海光 DCU 和寒武纪 MLU 的日常发版支持,同时通过大模型训练和推理部署的打磨修复了分布式通信中存在的问题,并通过显存优化、计算和通信的 overlap 等功能进行性能优化。其次、本次各个硬件还新增了大量 BFloat16 数据类型的算子支持,以及众多算子融合 Pass 和各个硬件上的融合算子,通过软硬联合的方式接入硬件大 Transformer 算子库来充分提升大模型性能。 - -#### 新增功能 -- 新增分布式策略 sharding stage1 v2 的支持。[#61500](https://github.com/PaddlePaddle/Paddle/pull/61500) -- 支持分布式通信模块支持 BF16 数据类型。新增部分算子对 BF16 数据类型的支持,如 empty、shape 等。[#60768](https://github.com/PaddlePaddle/Paddle/pull/60768),[#62140](https://github.com/PaddlePaddle/Paddle/pull/62140),[#62604](https://github.com/PaddlePaddle/Paddle/pull/62604) -- 新增 get_comm_name 接口的支持,对 memory stat 功能支持, 支持 Profiler 对内存时间的记录。[#62556](https://github.com/PaddlePaddle/Paddle/pull/62556),[#61030](https://github.com/PaddlePaddle/Paddle/pull/61030),[#62292](https://github.com/PaddlePaddle/Paddle/pull/62292) -- 新增部分融合策略和算子的支持,包括 silu_fuse_pass, conv_elementwise_add_act_fuse_pass, generator offset 的支持。 [#60595](https://github.com/PaddlePaddle/Paddle/pull/60595),[#60708](https://github.com/PaddlePaddle/Paddle/pull/60708),[#60616](https://github.com/PaddlePaddle/Paddle/pull/60616) - -#### 性能优化 -- 分布式通信策略 Sharing 在 Broadcast 参数采用异步策略,提升计算和通信的 overlap。 [#59745](https://github.com/PaddlePaddle/Paddle/pull/59745) -- 新增 STRIDED Layout 算子支持,提升算子性能。[#62532](https://github.com/PaddlePaddle/Paddle/pull/62532),[#62697](https://github.com/PaddlePaddle/Paddle/pull/62697),[#62649](https://github.com/PaddlePaddle/Paddle/pull/62649) -- 优化 elementwise_mul 算子内存使用。[#62377](https://github.com/PaddlePaddle/Paddle/pull/62377) - -#### Bug 修复 -- 修复分布式策略 Sharing 下的错误。[#61942](https://github.com/PaddlePaddle/Paddle/pull/61942),[#62236](https://github.com/PaddlePaddle/Paddle/pull/62236),[#62305](https://github.com/PaddlePaddle/Paddle/pull/62305),[#62535](https://github.com/PaddlePaddle/Paddle/pull/62535),[#62572](https://github.com/PaddlePaddle/Paddle/pull/62572),[#61601](https://github.com/PaddlePaddle/Paddle/pull/61601) -- 修复 c_embedding 算子不在 PHI namespace 下导致的算子无法注册的问题。[#60774](https://github.com/PaddlePaddle/Paddle/pull/60774) -- 修复 xccl_comm 释放问题。[#60465](https://github.com/PaddlePaddle/Paddle/pull/60465) -- 修复 index_put 算子 fallbacking cpu 时导致的数据地址错误。[#61842](https://github.com/PaddlePaddle/Paddle/pull/61842) -- 修复 stream_safe_custom_device_allocator 的问题。[#63369](https://github.com/PaddlePaddle/Paddle/pull/63369) -- 修复分布式下 worker 端口冲突问题。[#61409](https://github.com/PaddlePaddle/Paddle/pull/61409) -- 修复 comm 数据类型以提升设备兼容性。[#62306](https://github.com/PaddlePaddle/Paddle/pull/62306) -- 统一通信数据类型的使用为 phi::DataType。[#62464](https://github.com/PaddlePaddle/Paddle/pull/62464),[#62562](https://github.com/PaddlePaddle/Paddle/pull/62562) -- 修复 PD_ConfigEnableCustomDevice 缺少 precision 参数问题。[#63702](https://github.com/PaddlePaddle/Paddle/pull/63702) - -### 昆仑 XPU - -#### 新增功能 -- 新增部分算子对 BF16 数据类型的支持,包括 compare_kernel 与 add reduce_all_kernel([#63602](https://github.com/PaddlePaddle/Paddle/pull/63602))、empty([#60212](https://github.com/PaddlePaddle/Paddle/pull/60212))、hybrid_parallel_optimizer([#60213](https://github.com/PaddlePaddle/Paddle/pull/60213))、reduce_max/reduce_min([#60453](https://github.com/PaddlePaddle/Paddle/pull/60453))、all_reduce/concat/split([#62364](https://github.com/PaddlePaddle/Paddle/pull/62364))、tile/tile_grad([#63075](https://github.com/PaddlePaddle/Paddle/pull/63075))、accuracy([#63863](https://github.com/PaddlePaddle/Paddle/pull/63863)), swiglu/set_value([#64070](https://github.com/PaddlePaddle/Paddle/pull/64070))、amp_master_grad([#63865](https://github.com/PaddlePaddle/Paddle/pull/63865))、c_concat ([#63403](https://github.com/PaddlePaddle/Paddle/pull/63403))、flatten ([#63997](https://github.com/PaddlePaddle/Paddle/pull/63997))、compare_op ([#64473](https://github.com/PaddlePaddle/Paddle/pull/64473))、moment1/moment2 ([#62688](https://github.com/PaddlePaddle/Paddle/pull/62688))、fused_rope ([#60064](https://github.com/PaddlePaddle/Paddle/pull/60064))、c_softmax_with_cross_entropy ([#60472](https://github.com/PaddlePaddle/Paddle/pull/60472))、elementwise_pow/square/sin/cos ([#60402](https://github.com/PaddlePaddle/Paddle/pull/60402))、strided_slice ([#60382](https://github.com/PaddlePaddle/Paddle/pull/60382))、tile/sigmoid_grad ([#60119](https://github.com/PaddlePaddle/Paddle/pull/60119))、 elementwise_sub/elementwise_div ([#60386](https://github.com/PaddlePaddle/Paddle/pull/60386))、softmax_with_cross_entropy ([#63759](https://github.com/PaddlePaddle/Paddle/pull/63759)) -- 新增部分算子对 INT8 数据类型的支持,包括 multi_encoder_xpu ([#61212](https://github.com/PaddlePaddle/Paddle/pull/61212))、qkv_attention ([#63105](https://github.com/PaddlePaddle/Paddle/pull/63105)) -- 更新昆仑 SDK 版本包括 BKCL、XHPC、XCCL 等。 [#59895](https://github.com/PaddlePaddle/Paddle/pull/59895)、[#59888](https://github.com/PaddlePaddle/Paddle/pull/59888)、[#63624](https://github.com/PaddlePaddle/Paddle/pull/63624), [#60305](https://github.com/PaddlePaddle/Paddle/pull/60305), [#62076](https://github.com/PaddlePaddle/Paddle/pull/62076), [#62646](https://github.com/PaddlePaddle/Paddle/pull/62646), [#63520](https://github.com/PaddlePaddle/Paddle/pull/63520), [#64163](https://github.com/PaddlePaddle/Paddle/pull/64163), [#64326](https://github.com/PaddlePaddle/Paddle/pull/64326), [#60617](https://github.com/PaddlePaddle/Paddle/pull/60617), [#60377](https://github.com/PaddlePaddle/Paddle/pull/60377), [#60421](https://github.com/PaddlePaddle/Paddle/pull/60421), [#60598](https://github.com/PaddlePaddle/Paddle/pull/60598), [#61199](https://github.com/PaddlePaddle/Paddle/pull/61199) -- 新增对 memory stat 功能支持。[#61116](https://github.com/PaddlePaddle/Paddle/pull/61116) -- 新增多 stream 支持,且可以给每个 stream 分配默认的 l3/gm buffer 大小。 [#62729](https://github.com/PaddlePaddle/Paddle/pull/62729) -- 新增 nonzero 算子支持支持 simulator XPUSIM_SKIP_RUN 模式。[#60224](https://github.com/PaddlePaddle/Paddle/pull/60224)。[#60388](https://github.com/PaddlePaddle/Paddle/pull/60388) -- 新增 stride_slice 和 stride_slice_grad 算子支持 strides < 0。 [#62749](https://github.com/PaddlePaddle/Paddle/pull/62749) -- 新增 rotary_embedding 对 use_neox_rotary_style == True 的支持。[#64090](https://github.com/PaddlePaddle/Paddle/pull/64090) -- 新增融合 Pass 和融合算子,包括 cross_attention ([#63203](https://github.com/PaddlePaddle/Paddle/pull/63203))、fused_bias_act ([#62232](https://github.com/PaddlePaddle/Paddle/pull/62232))、fused_layernorm ([#62228](https://github.com/PaddlePaddle/Paddle/pull/62228))、group_norm_silu_xpu_fuse_pass ([#63342](https://github.com/PaddlePaddle/Paddle/pull/63342)) -- 新增对分布式策略 sharding stage3 的支持。 [#57457](https://github.com/PaddlePaddle/Paddle/pull/57457) -- 新增 tf32 fc quantization 模式的支持。[#62273](https://github.com/PaddlePaddle/Paddle/pull/62273) -- 新增 flash attention 算子。[#60065](https://github.com/PaddlePaddle/Paddle/pull/60065) -- 新增 roformer relative embedding pass & kernel 并支持 multi_encoder_xpu。[#62089](https://github.com/PaddlePaddle/Paddle/pull/62089) -- 新增 pp + sharding 策略支持。[#63640](https://github.com/PaddlePaddle/Paddle/pull/63640) -- 升级 XPU 通信库架构以支持动静统一的通信库功能。[#63817](https://github.com/PaddlePaddle/Paddle/pull/63817) - -#### 性能优化 -- 新增 XHPC buffer manager 以提升 Paddle 和 XHPC 内存协同性能。 [#63924](https://github.com/PaddlePaddle/Paddle/pull/63924) -- 提升 TensorSetConstantXPU 性能,并支持 BF16 数据类型。[#63920](https://github.com/PaddlePaddle/Paddle/pull/63920),[#61818](https://github.com/PaddlePaddle/Paddle/pull/61818) -- 融合多个 group norm + silu + conv 模块, 压缩显存。[#62892](https://github.com/PaddlePaddle/Paddle/pull/62892) -- 优化 comm manager 中 XPU 显存分配。[#64139](https://github.com/PaddlePaddle/Paddle/pull/64139) -- 优化算子性能,包括 mean_all_grad ([#61148](https://github.com/PaddlePaddle/Paddle/pull/61148))、dropout_v2 ([#61029](https://github.com/PaddlePaddle/Paddle/pull/61029))、fused_rotary_position_embedding ([#62846](https://github.com/PaddlePaddle/Paddle/pull/62846))、cross_entropy ([#63159](https://github.com/PaddlePaddle/Paddle/pull/63159))、elementwise_add ([#64289](https://github.com/PaddlePaddle/Paddle/pull/64289))、fused_gemm_epilogue ([#61350](https://github.com/PaddlePaddle/Paddle/pull/61350)、check_nan_or_inf ([#60853](https://github.com/PaddlePaddle/Paddle/pull/60853)) -- XPU 硬件下新增 qk_qkv_attention_xpu_fuse_pass 和 qkv_attention_xpu_kernel。 [#60089](https://github.com/PaddlePaddle/Paddle/pull/60089) -- XPU 硬件下新增 rotary position 编码的融合算子支持 elementwise_mul + strided_slice + sin/cos+ stack 融合为 1 个算子。 [#60025](https://github.com/PaddlePaddle/Paddle/pull/60025) -- 添加 group_norm_silu_xpu_fuse_pass。 [#62689](https://github.com/PaddlePaddle/Paddle/pull/62689) -- 添加 weight_only_linear_xpu_pass。 [#64185](https://github.com/PaddlePaddle/Paddle/pull/64185) -- 新增 block_multihead_attention 算子及 PASS,支持 LLaMA2 模型在 XPU 设备中的大模型推理。 [#65036](https://github.com/PaddlePaddle/Paddle/pull/65036) -- 支持 squeeze_excitation_block_xpu_kernel 的 float16 类型。 [#61023](https://github.com/PaddlePaddle/Paddle/pull/61023) - -#### Bug 修复 -- 修复 tile 算子对 0 维 Tensor 的支持。 [#64279](https://github.com/PaddlePaddle/Paddle/pull/64279) -- 修复 group_norm_silu_fuse_pass。 [#63449](https://github.com/PaddlePaddle/Paddle/pull/63449) -- 修复 XPU API GM 显存问题。[#60260](https://github.com/PaddlePaddle/Paddle/pull/60260),[#60387](https://github.com/PaddlePaddle/Paddle/pull/60387),[#62940](https://github.com/PaddlePaddle/Paddle/pull/62940) -- 修复分布式策略 Sharing stage1 v2 的错误。[#64209](https://github.com/PaddlePaddle/Paddle/pull/64209) -- 修复 XPU constant 问题。[#60763](https://github.com/PaddlePaddle/Paddle/pull/60763) -- 修复部分算子问题,包括 AdamW ([#62251](https://github.com/PaddlePaddle/Paddle/pull/62251))、dropout_v3 ([#62726](https://github.com/PaddlePaddle/Paddle/pull/62726))、softmax([#63780](https://github.com/PaddlePaddle/Paddle/pull/63780)) 、fused rope embedding ([#62143](https://github.com/PaddlePaddle/Paddle/pull/62143))、elementwise_add ([#60252](https://github.com/PaddlePaddle/Paddle/pull/60252))、resnet_basic_block ([#62914](https://github.com/PaddlePaddle/Paddle/pull/62914)) -- 修复 XPU 运行和安装相关问题。[#60028](https://github.com/PaddlePaddle/Paddle/pull/60028),[#61970](https://github.com/PaddlePaddle/Paddle/pull/61970) -- 修复 XPU 编译 bug。[#63307](https://github.com/PaddlePaddle/Paddle/pull/63307) -- 修复 XPU 通信库初始化时端侧内存相关的 bug。[#64396](https://github.com/PaddlePaddle/Paddle/pull/64396) - -### 海光 DCU - -#### 新增功能 -- 新增对海光 DCU K100 支持。[#63535](https://github.com/PaddlePaddle/Paddle/pull/63535) -- 支持 complex64/128 数据类型,并支持 fused_bias_residual_layernorm、fused_bias_dropout_residual_layer_norm、rms_norm 等融合算子。 [#63217](https://github.com/PaddlePaddle/Paddle/pull/63217) - -#### Bug 修复 -- 修复 DTK 和 ROCM 版本升级的编译错误问题。 [#62832](https://github.com/PaddlePaddle/Paddle/pull/62832),[#62931](https://github.com/PaddlePaddle/Paddle/pull/62931),[#61872](https://github.com/PaddlePaddle/Paddle/pull/61872),[#63738](https://github.com/PaddlePaddle/Paddle/pull/63738) - -## 10.环境更新 -此版本飞桨完成基础依赖库的发版和更新同步,移除了不再更新的老旧依赖库。完成了多项优化提升编译效率、兼容性,完善 CI 流水线监测功能以提升用户安装体验。修复了多个已知编译问题,完善 paddle 的编译系统,新增了一些特性支持。通过相关优化工作,飞桨框架的编译安装体验进一步提升,给开发者带来更好的使用和开发体验。 - -### 新增支持 -- 支持用户安装 paddle 不依赖本地的 cuda 和 cudnn,提升用户安装体验。[#60841](https://github.com/PaddlePaddle/Paddle/pull/60841),[#61973](https://github.com/PaddlePaddle/Paddle/pull/61973),[#61862](https://github.com/PaddlePaddle/Paddle/pull/61862),[#61235](https://github.com/PaddlePaddle/Paddle/pull/61235),[#61209](https://github.com/PaddlePaddle/Paddle/pull/61209),[#61653](https://github.com/PaddlePaddle/Paddle/pull/61653),[#64083](https://github.com/PaddlePaddle/Paddle/pull/64083) -- 全面支持 CUDA 12.3,同时完成 cuda10.2 退场。[#63356](https://github.com/PaddlePaddle/Paddle/pull/63356),[#60299](https://github.com/PaddlePaddle/Paddle/pull/60299),[#64171](https://github.com/PaddlePaddle/Paddle/pull/64171),[#62189](https://github.com/PaddlePaddle/Paddle/pull/62189),[#63392](https://github.com/PaddlePaddle/Paddle/pull/63392),[#64228](https://github.com/PaddlePaddle/Paddle/pull/64228),[#62498](https://github.com/PaddlePaddle/Paddle/pull/62498),[#64298](https://github.com/PaddlePaddle/Paddle/pull/64298) -- 全面支持 Python 3.12,带来了更强大的语言特性和性能优化,同时完成 python3.7 退场。[#59875](https://github.com/PaddlePaddle/Paddle/pull/59875),[#59877](https://github.com/PaddlePaddle/Paddle/pull/59877),[#59876](https://github.com/PaddlePaddle/Paddle/pull/59876) -- 其他 paddle 依赖的第三方库升级:[#63741](https://github.com/PaddlePaddle/Paddle/pull/63741),[#64447](https://github.com/PaddlePaddle/Paddle/pull/64447),[#60195](https://github.com/PaddlePaddle/Paddle/pull/60195),[#60110](https://github.com/PaddlePaddle/Paddle/pull/60110),[#61509](https://github.com/PaddlePaddle/Paddle/pull/61509) - -### 编译优化 -- 优化了 paddle 的 CMake 代码,显著提升了编译效率和编译体验。[##59995](https://github.com/PaddlePaddle/Paddle/pull/59995),[#60167](https://github.com/PaddlePaddle/Paddle/pull/60167),[#61052](https://github.com/PaddlePaddle/Paddle/pull/61052),[#59995](https://github.com/PaddlePaddle/Paddle/pull/59995),[#59607](https://github.com/PaddlePaddle/Paddle/pull/59607),[#63093](https://github.com/PaddlePaddle/Paddle/pull/63093),[#63887](https://github.com/PaddlePaddle/Paddle/pull/63887),[#62969](https://github.com/PaddlePaddle/Paddle/pull/62969),[#64007](https://github.com/PaddlePaddle/Paddle/pull/64007),[#59811](https://github.com/PaddlePaddle/Paddle/pull/59811),[#63045](https://github.com/PaddlePaddle/Paddle/pull/63045),[#60235](https://github.com/PaddlePaddle/Paddle/pull/60235),[#60240](https://github.com/PaddlePaddle/Paddle/pull/60240),[#60235](https://github.com/PaddlePaddle/Paddle/pull/60235),[#61411](https://github.com/PaddlePaddle/Paddle/pull/61411),[#61944](https://github.com/PaddlePaddle/Paddle/pull/61944),[#61961](https://github.com/PaddlePaddle/Paddle/pull/61961),[#59990](https://github.com/PaddlePaddle/Paddle/pull/59990),[#59478](https://github.com/PaddlePaddle/Paddle/pull/59478),[#61501](https://github.com/PaddlePaddle/Paddle/pull/61501),[#60066](https://github.com/PaddlePaddle/Paddle/pull/60066),[#64133](https://github.com/PaddlePaddle/Paddle/pull/64133),[#64231](https://github.com/PaddlePaddle/Paddle/pull/64231),[#60087](https://github.com/PaddlePaddle/Paddle/pull/60087),[#60348](https://github.com/PaddlePaddle/Paddle/pull/60348),[#60737](https://github.com/PaddlePaddle/Paddle/pull/60737),[#61364](https://github.com/PaddlePaddle/Paddle/pull/61364),[#63214](https://github.com/PaddlePaddle/Paddle/pull/63214),[#62454](https://github.com/PaddlePaddle/Paddle/pull/62454),[#62473](https://github.com/PaddlePaddle/Paddle/pull/62473),[#63692](https://github.com/PaddlePaddle/Paddle/pull/63692),[#63950](https://github.com/PaddlePaddle/Paddle/pull/63950) -- 支持在 linux 和 windowx 下 C++单测链接动态库,大幅减少 C++单测的体积大小和整个 build 目录大小。[#60008](https://github.com/PaddlePaddle/Paddle/pull/60008),[#60960](https://github.com/PaddlePaddle/Paddle/pull/60960),[#60960](https://github.com/PaddlePaddle/Paddle/pull/60960),[#60961](https://github.com/PaddlePaddle/Paddle/pull/60961),[#60831](https://github.com/PaddlePaddle/Paddle/pull/60831),[#60832](https://github.com/PaddlePaddle/Paddle/pull/60832),[#60833](https://github.com/PaddlePaddle/Paddle/pull/60833),[#61372](https://github.com/PaddlePaddle/Paddle/pull/61372),[#60834](https://github.com/PaddlePaddle/Paddle/pull/60834),[#61374](https://github.com/PaddlePaddle/Paddle/pull/61374),[#61463](https://github.com/PaddlePaddle/Paddle/pull/61463),[#61376](https://github.com/PaddlePaddle/Paddle/pull/61376),[#60830](https://github.com/PaddlePaddle/Paddle/pull/60830),[#61373](https://github.com/PaddlePaddle/Paddle/pull/61373),[#61672](https://github.com/PaddlePaddle/Paddle/pull/61672),[#61375](https://github.com/PaddlePaddle/Paddle/pull/61375),[#61676](https://github.com/PaddlePaddle/Paddle/pull/61676),[#62036](https://github.com/PaddlePaddle/Paddle/pull/62036),[#61945](https://github.com/PaddlePaddle/Paddle/pull/61945),[#61675](https://github.com/PaddlePaddle/Paddle/pull/61675),[#61674](https://github.com/PaddlePaddle/Paddle/pull/61674),[#62773](https://github.com/PaddlePaddle/Paddle/pull/62773),[#61238](https://github.com/PaddlePaddle/Paddle/pull/61238),[#59988](https://github.com/PaddlePaddle/Paddle/pull/59988),[#60307](https://github.com/PaddlePaddle/Paddle/pull/60307),[#59612](https://github.com/PaddlePaddle/Paddle/pull/59612),[#59942](https://github.com/PaddlePaddle/Paddle/pull/59942),[#59968](https://github.com/PaddlePaddle/Paddle/pull/59968),[#59978](https://github.com/PaddlePaddle/Paddle/pull/59978),[#60121](https://github.com/PaddlePaddle/Paddle/pull/60121),[#60149](https://github.com/PaddlePaddle/Paddle/pull/60149),[#60161](https://github.com/PaddlePaddle/Paddle/pull/60161),[#60160](https://github.com/PaddlePaddle/Paddle/pull/60160),[#60230](https://github.com/PaddlePaddle/Paddle/pull/60230),[#60154](https://github.com/PaddlePaddle/Paddle/pull/60154),[#60356](https://github.com/PaddlePaddle/Paddle/pull/60356),[#60392](https://github.com/PaddlePaddle/Paddle/pull/60392),[#60517](https://github.com/PaddlePaddle/Paddle/pull/60517),[#61131](https://github.com/PaddlePaddle/Paddle/pull/61131),[#60959](https://github.com/PaddlePaddle/Paddle/pull/60959) -- 新增对 Clang 编译器的支持,用户现在可以使用 Clang 进行编译,享受更快的编译速度和更好的报错信息提示。[#63382](https://github.com/PaddlePaddle/Paddle/pull/63382),[#63133](https://github.com/PaddlePaddle/Paddle/pull/63133),[#61705](https://github.com/PaddlePaddle/Paddle/pull/61705),[#63152](https://github.com/PaddlePaddle/Paddle/pull/63152),[#63373](https://github.com/PaddlePaddle/Paddle/pull/63373) - -### CI 流水线改进 -- 对 CI 流水线中的合入代码监测机制进行了完善,确保更高的代码质量和稳定性。新增了功能监控模块,实时监控 CI 流水线的各项指标,确保每个阶段的顺利执行,及时发现和解决问题。[#61384](https://github.com/PaddlePaddle/Paddle/pull/61384),[#62190](https://github.com/PaddlePaddle/Paddle/pull/62190),[#60758](https://github.com/PaddlePaddle/Paddle/pull/60758),[#60399](https://github.com/PaddlePaddle/Paddle/pull/60399),[#58623](https://github.com/PaddlePaddle/Paddle/pull/58623),[#62177](https://github.com/PaddlePaddle/Paddle/pull/62177),[#62361](https://github.com/PaddlePaddle/Paddle/pull/62361),[#62893](https://github.com/PaddlePaddle/Paddle/pull/62893),[#63705](https://github.com/PaddlePaddle/Paddle/pull/63705),[#64476](https://github.com/PaddlePaddle/Paddle/pull/64476),[#64752](https://github.com/PaddlePaddle/Paddle/pull/64752),[#64733](https://github.com/PaddlePaddle/Paddle/pull/64733),[#61914](https://github.com/PaddlePaddle/Paddle/pull/61914) - -### 代码清理 -- 删除了一些老旧的代码。[#63580](https://github.com/PaddlePaddle/Paddle/pull/63580),[#62840](https://github.com/PaddlePaddle/Paddle/pull/62840),[#62886](https://github.com/PaddlePaddle/Paddle/pull/62886),[#63046](https://github.com/PaddlePaddle/Paddle/pull/63046),[#63004](https://github.com/PaddlePaddle/Paddle/pull/63004),[#63039](https://github.com/PaddlePaddle/Paddle/pull/63039),[#62733](https://github.com/PaddlePaddle/Paddle/pull/62733),[#62773](https://github.com/PaddlePaddle/Paddle/pull/62773),[#62768](https://github.com/PaddlePaddle/Paddle/pull/62768),[#62744](https://github.com/PaddlePaddle/Paddle/pull/62744),[#62861](https://github.com/PaddlePaddle/Paddle/pull/62861),[#62774](https://github.com/PaddlePaddle/Paddle/pull/62774),[#62851](https://github.com/PaddlePaddle/Paddle/pull/62851),[#62973](https://github.com/PaddlePaddle/Paddle/pull/62973),[#63273](https://github.com/PaddlePaddle/Paddle/pull/63273),[#62445](https://github.com/PaddlePaddle/Paddle/pull/62445),[#64382](https://github.com/PaddlePaddle/Paddle/pull/64382),[#64409](https://github.com/PaddlePaddle/Paddle/pull/64409),[#64391](https://github.com/PaddlePaddle/Paddle/pull/64391),[#64310](https://github.com/PaddlePaddle/Paddle/pull/64310),[#64348](https://github.com/PaddlePaddle/Paddle/pull/64348),[#64651](https://github.com/PaddlePaddle/Paddle/pull/64651),[#64709](https://github.com/PaddlePaddle/Paddle/pull/64709),[#61714](https://github.com/PaddlePaddle/Paddle/pull/61714),[#62109](https://github.com/PaddlePaddle/Paddle/pull/62109),[#61751](https://github.com/PaddlePaddle/Paddle/pull/61751),[#61691](https://github.com/PaddlePaddle/Paddle/pull/61691),[#61735](https://github.com/PaddlePaddle/Paddle/pull/61735) + +- 修复 Predictor 在保存/加载 PIR 模型时有关问题。 [#65180](https://github.com/PaddlePaddle/Paddle/pull/65180),[#65019](https://github.com/PaddlePaddle/Paddle/pull/65019),[#65714](https://github.com/PaddlePaddle/Paddle/pull/65714),[#69619](https://github.com/PaddlePaddle/Paddle/pull/69619),[#67570](https://github.com/PaddlePaddle/Paddle/pull/67570),[#65595](https://github.com/PaddlePaddle/Paddle/pull/65595),[#69200](https://github.com/PaddlePaddle/Paddle/pull/69200) +- 修复推理单测在 PIR、多硬件等场景下的执行问题。[#65763](https://github.com/PaddlePaddle/Paddle/pull/65763),[#66481](https://github.com/PaddlePaddle/Paddle/pull/66481),[#67105](https://github.com/PaddlePaddle/Paddle/pull/67105),[#67248](https://github.com/PaddlePaddle/Paddle/pull/67248),[#67470](https://github.com/PaddlePaddle/Paddle/pull/67470),[#67638](https://github.com/PaddlePaddle/Paddle/pull/67638),[#68135](https://github.com/PaddlePaddle/Paddle/pull/68135),[#68191](https://github.com/PaddlePaddle/Paddle/pull/68191),[#68211](https://github.com/PaddlePaddle/Paddle/pull/68211),[#68160](https://github.com/PaddlePaddle/Paddle/pull/68160),[#68185](https://github.com/PaddlePaddle/Paddle/pull/68185),[#68127](https://github.com/PaddlePaddle/Paddle/pull/68127),[#68887](https://github.com/PaddlePaddle/Paddle/pull/68887),[#69191](https://github.com/PaddlePaddle/Paddle/pull/69191), [#70961](https://github.com/PaddlePaddle/Paddle/pull/70961),[#68020](https://github.com/PaddlePaddle/Paddle/pull/68020),[#67923](https://github.com/PaddlePaddle/Paddle/pull/67923),[#67963](https://github.com/PaddlePaddle/Paddle/pull/67963),[#68482](https://github.com/PaddlePaddle/Paddle/pull/68482),[#68546](https://github.com/PaddlePaddle/Paddle/pull/68546),[#68593](https://github.com/PaddlePaddle/Paddle/pull/68593),[#68793](https://github.com/PaddlePaddle/Paddle/pull/68793) +- 修复 Paddle TensorRT 转换与执行相关问题。[#66932](https://github.com/PaddlePaddle/Paddle/pull/66932),[#66655](https://github.com/PaddlePaddle/Paddle/pull/66655),[#67274](https://github.com/PaddlePaddle/Paddle/pull/67274),[#67504](https://github.com/PaddlePaddle/Paddle/pull/67504),[#65780](https://github.com/PaddlePaddle/Paddle/pull/65780),[#68170](https://github.com/PaddlePaddle/Paddle/pull/68170),[#68647](https://github.com/PaddlePaddle/Paddle/pull/68647),[#68776](https://github.com/PaddlePaddle/Paddle/pull/68776),[#69573](https://github.com/PaddlePaddle/Paddle/pull/69573),[#69598](https://github.com/PaddlePaddle/Paddle/pull/69598),[#69510](https://github.com/PaddlePaddle/Paddle/pull/69510),[#69864](https://github.com/PaddlePaddle/Paddle/pull/69864),[#69885](https://github.com/PaddlePaddle/Paddle/pull/69885),[#70161](https://github.com/PaddlePaddle/Paddle/pull/70161),[#70116](https://github.com/PaddlePaddle/Paddle/pull/70116),[#70791](https://github.com/PaddlePaddle/Paddle/pull/70791),[#70801](https://github.com/PaddlePaddle/Paddle/pull/70801),[#70824](https://github.com/PaddlePaddle/Paddle/pull/70824),[#70939](https://github.com/PaddlePaddle/Paddle/pull/70939), [#71143](https://github.com/PaddlePaddle/Paddle/pull/71143),[#71154](https://github.com/PaddlePaddle/Paddle/pull/71154),[#71163](https://github.com/PaddlePaddle/Paddle/pull/71163),[#71183](https://github.com/PaddlePaddle/Paddle/pull/71183),[#71233](https://github.com/PaddlePaddle/Paddle/pull/71233),[#71287](https://github.com/PaddlePaddle/Paddle/pull/71287),[#71319](https://github.com/PaddlePaddle/Paddle/pull/71319),[#67720](https://github.com/PaddlePaddle/Paddle/pull/67720),[#69671](https://github.com/PaddlePaddle/Paddle/pull/69671),[#70168](https://github.com/PaddlePaddle/Paddle/pull/70168),[#69957](https://github.com/PaddlePaddle/Paddle/pull/69957) +- Paddle Inference 编译链接相关问题修复。[#65846](https://github.com/PaddlePaddle/Paddle/pull/65846),[#67081](https://github.com/PaddlePaddle/Paddle/pull/67081),[#63184](https://github.com/PaddlePaddle/Paddle/pull/63184) +- 量化问题修复。[#67839](https://github.com/PaddlePaddle/Paddle/pull/67839),[#68049](https://github.com/PaddlePaddle/Paddle/pull/68049),[#70099](https://github.com/PaddlePaddle/Paddle/pull/70099), [#64878](https://github.com/PaddlePaddle/Paddle/pull/64878),[#65717](https://github.com/PaddlePaddle/Paddle/pull/65717),[#67552](https://github.com/PaddlePaddle/Paddle/pull/67552),[#67715](https://github.com/PaddlePaddle/Paddle/pull/67715) +- OneDNN 推理问题修复。[#67836](https://github.com/PaddlePaddle/Paddle/pull/67836),[#68021](https://github.com/PaddlePaddle/Paddle/pull/68021),[#68132](https://github.com/PaddlePaddle/Paddle/pull/68132),[#71426](https://github.com/PaddlePaddle/Paddle/pull/71426),[#68057](https://github.com/PaddlePaddle/Paddle/pull/68057) +- 内存问题修复。[#68631](https://github.com/PaddlePaddle/Paddle/pull/68631),[#69129](https://github.com/PaddlePaddle/Paddle/pull/69129),[#70314](https://github.com/PaddlePaddle/Paddle/pull/70314),[#67863](https://github.com/PaddlePaddle/Paddle/pull/67863) +- Paddle Inference 支持 OpenVINO 问题修复。[#70212](https://github.com/PaddlePaddle/Paddle/pull/70212),[#70288](https://github.com/PaddlePaddle/Paddle/pull/70288), +- Pass 相关问题修复。[#65349](https://github.com/PaddlePaddle/Paddle/pull/65349),[#65421](https://github.com/PaddlePaddle/Paddle/pull/65421),[#65677](https://github.com/PaddlePaddle/Paddle/pull/65677),[#66850](https://github.com/PaddlePaddle/Paddle/pull/66850),[#67443](https://github.com/PaddlePaddle/Paddle/pull/67443),[#67620](https://github.com/PaddlePaddle/Paddle/pull/67620),[#68158](https://github.com/PaddlePaddle/Paddle/pull/68158),[#68642](https://github.com/PaddlePaddle/Paddle/pull/68642),[#68837](https://github.com/PaddlePaddle/Paddle/pull/68837),[#68880](https://github.com/PaddlePaddle/Paddle/pull/68880),[#68935](https://github.com/PaddlePaddle/Paddle/pull/68935),[#69112](https://github.com/PaddlePaddle/Paddle/pull/69112),[#69205](https://github.com/PaddlePaddle/Paddle/pull/69205),[#69242](https://github.com/PaddlePaddle/Paddle/pull/69242),[#69352](https://github.com/PaddlePaddle/Paddle/pull/69352),[#69421](https://github.com/PaddlePaddle/Paddle/pull/69421),[#69690](https://github.com/PaddlePaddle/Paddle/pull/69690), +- 其他类问题修复。[#70237](https://github.com/PaddlePaddle/Paddle/pull/70237),[#68173](https://github.com/PaddlePaddle/Paddle/pull/68173) +- 修复 fused_moe 相关问题(测试/GEMM/WINT4/多架构兼容性/Bias 可选)[#67353](https://github.com/PaddlePaddle/Paddle/pull/67353), [#67396](https://github.com/PaddlePaddle/Paddle/pull/67396), [#67717](https://github.com/PaddlePaddle/Paddle/pull/67717), [#67794](https://github.com/PaddlePaddle/Paddle/pull/67794), [#67783](https://github.com/PaddlePaddle/Paddle/pull/67783) +- 修复 block_attention 系列问题(GQA 差异/越界风险/多头支持)[#67175](https://github.com/PaddlePaddle/Paddle/pull/67175), [#69001](https://github.com/PaddlePaddle/Paddle/pull/69001), [#70763](https://github.com/PaddlePaddle/Paddle/pull/70763) +- 修复 PIR 相关问题(布局转换/BF16 替换错误)[#66977](https://github.com/PaddlePaddle/Paddle/pull/66977), [#67830](https://github.com/PaddlePaddle/Paddle/pull/67830) +- 修复分布式相关(allreduce 数据类型/参数同步)[#67449](https://github.com/PaddlePaddle/Paddle/pull/67449), [#69157](https://github.com/PaddlePaddle/Paddle/pull/69157) +- 修复内核执行问题(前向反向冲突/默认流 argsort)[#67218](https://github.com/PaddlePaddle/Paddle/pull/67218), [#68374](https://github.com/PaddlePaddle/Paddle/pull/68374) +- 其他关键修复(减小 C++库体积/修复 NeoX 格式下的 RoPE 计算/修复静态图执行)[#66041](https://github.com/PaddlePaddle/Paddle/pull/66041), [#66583](https://github.com/PaddlePaddle/Paddle/pull/66583), [#67580](https://github.com/PaddlePaddle/Paddle/pull/67580) + +### 其他修改 + +- 代码清理与维护(API 弃用/编译警告修复)[#68048](https://github.com/PaddlePaddle/Paddle/pull/68048), [#70384](https://github.com/PaddlePaddle/Paddle/pull/70384) +- 第三方集成优化(OpenVINO 子模块管理)[#70313](https://github.com/PaddlePaddle/Paddle/pull/70313), [#70425](https://github.com/PaddlePaddle/Paddle/pull/70425) + +## 8. 硬件适配 + +针对昆仑、海光等平台持续进行功能完善和升级,提升用户体验 + +### 新功能 + +昆仑芯 XPU 上进行 OP 的添加和功能的完善,涉及的 ops 包括:flash attention/flash_attn_unpadded、multinomial、matmul、repeat_interleave、logsumexp、index_put_grad、mean_grad、pow、pow_grad、rsqrt、full、rms_norm、rms_norm_grad、put_along_axis、Cumsum、argmin、masked_select/grad、expand_v2/grad、all2all、expand、reduce_sum、reduce_max、reduce_min、moe、fused_linear_param_grad_add、adamw、clip/clip_grad、tan、acos、blha_get_max_len、gather/gather_grad、scatter/scatter_grad、round、index_select/sindex_select_grad、isfinite、isinf、quantize_linear、dequantize_linear、conv3d_transpose、logsumexp_grad、index_add_grad、eye、gather_element、tril、triu、set_value_grad、argmax、take_along_axis 等 +[#65413](https://github.com/PaddlePaddle/Paddle/pull/65413), [#64846](https://github.com/PaddlePaddle/Paddle/pull/64846), [#65656](https://github.com/PaddlePaddle/Paddle/pull/65656), [#65963](https://github.com/PaddlePaddle/Paddle/pull/65963), [#66143](https://github.com/PaddlePaddle/Paddle/pull/66143), [#66482](https://github.com/PaddlePaddle/Paddle/pull/66482), [#66585](https://github.com/PaddlePaddle/Paddle/pull/66585), [#67077](https://github.com/PaddlePaddle/Paddle/pull/67077), [#67173](https://github.com/PaddlePaddle/Paddle/pull/67173), [#67551](https://github.com/PaddlePaddle/Paddle/pull/67551), [#63989](https://github.com/PaddlePaddle/Paddle/pull/63989), [#67919](https://github.com/PaddlePaddle/Paddle/pull/67919), [#68052](https://github.com/PaddlePaddle/Paddle/pull/68052), [#68176](https://github.com/PaddlePaddle/Paddle/pull/68176), [#68408](https://github.com/PaddlePaddle/Paddle/pull/68408), [#68454](https://github.com/PaddlePaddle/Paddle/pull/68454), [#68478](https://github.com/PaddlePaddle/Paddle/pull/68478), [#68473](https://github.com/PaddlePaddle/Paddle/pull/68473), [#68453](https://github.com/PaddlePaddle/Paddle/pull/68453), [#68770](https://github.com/PaddlePaddle/Paddle/pull/68770), [#68933](https://github.com/PaddlePaddle/Paddle/pull/68933), [#69042](https://github.com/PaddlePaddle/Paddle/pull/69042), [#68713](https://github.com/PaddlePaddle/Paddle/pull/68713), [#69368](https://github.com/PaddlePaddle/Paddle/pull/69368), [#69723](https://github.com/PaddlePaddle/Paddle/pull/69723), [#69767](https://github.com/PaddlePaddle/Paddle/pull/69767), [#69898](https://github.com/PaddlePaddle/Paddle/pull/69898), [#69970](https://github.com/PaddlePaddle/Paddle/pull/69970), [#69771](https://github.com/PaddlePaddle/Paddle/pull/69771), [#70176](https://github.com/PaddlePaddle/Paddle/pull/70176), [#70428](https://github.com/PaddlePaddle/Paddle/pull/70428), [#70573](https://github.com/PaddlePaddle/Paddle/pull/70573), [#70576](https://github.com/PaddlePaddle/Paddle/pull/70576), [#70633](https://github.com/PaddlePaddle/Paddle/pull/70633), [#70114](https://github.com/PaddlePaddle/Paddle/pull/70114), [#70627](https://github.com/PaddlePaddle/Paddle/pull/70627), [#71038](https://github.com/PaddlePaddle/Paddle/pull/71038), [#71132](https://github.com/PaddlePaddle/Paddle/pull/71132), [#71228](https://github.com/PaddlePaddle/Paddle/pull/71228), [#71274](https://github.com/PaddlePaddle/Paddle/pull/71274), [#71364](https://github.com/PaddlePaddle/Paddle/pull/71364), [#71375](https://github.com/PaddlePaddle/Paddle/pull/71375), [#71431](https://github.com/PaddlePaddle/Paddle/pull/71431), [#71451](https://github.com/PaddlePaddle/Paddle/pull/71451), [#67585](https://github.com/PaddlePaddle/Paddle/pull/67585), [#67637](https://github.com/PaddlePaddle/Paddle/pull/67637), [#67914](https://github.com/PaddlePaddle/Paddle/pull/67914), [#67641](https://github.com/PaddlePaddle/Paddle/pull/67641), [#67913](https://github.com/PaddlePaddle/Paddle/pull/67913), [#67955](https://github.com/PaddlePaddle/Paddle/pull/67955), [#68411](https://github.com/PaddlePaddle/Paddle/pull/68411), [#68560](https://github.com/PaddlePaddle/Paddle/pull/68560), [#68423](https://github.com/PaddlePaddle/Paddle/pull/68423), [#68894](https://github.com/PaddlePaddle/Paddle/pull/68894), [#71053](https://github.com/PaddlePaddle/Paddle/pull/71053), [#71047](https://github.com/PaddlePaddle/Paddle/pull/71047), [#69056](https://github.com/PaddlePaddle/Paddle/pull/69056), [#70843](https://github.com/PaddlePaddle/Paddle/pull/70843), [#65653](https://github.com/PaddlePaddle/Paddle/pull/65653), [#68023](https://github.com/PaddlePaddle/Paddle/pull/68023), [#67780](https://github.com/PaddlePaddle/Paddle/pull/67780), [#68622](https://github.com/PaddlePaddle/Paddle/pull/68622), [#67215](https://github.com/PaddlePaddle/Paddle/pull/67215) + +海光 DCU 上添加 rocsolver、warpctc 的支持,并进行 OP 的添加和功能的完善,涉及的 ops 包括:flash_attention、hipblaslt、fastgelu、multiclass_nms3 + +[#68066](https://github.com/PaddlePaddle/Paddle/pull/68066), [#69457](https://github.com/PaddlePaddle/Paddle/pull/69457), [#68603](https://github.com/PaddlePaddle/Paddle/pull/68603), [#65599](https://github.com/PaddlePaddle/Paddle/pull/65599), [#70587](https://github.com/PaddlePaddle/Paddle/pull/70587), [#71337](https://github.com/PaddlePaddle/Paddle/pull/71337), [#70173](https://github.com/PaddlePaddle/Paddle/pull/70173) + ### Bug 修复 -- 修复多个 paddle 框架的编译问题。[#63297](https://github.com/PaddlePaddle/Paddle/pull/63297),[#62994](https://github.com/PaddlePaddle/Paddle/pull/62994),[#62651](https://github.com/PaddlePaddle/Paddle/pull/62651),[#64408](https://github.com/PaddlePaddle/Paddle/pull/64408),[#60934](https://github.com/PaddlePaddle/Paddle/pull/60934),[#62899](https://github.com/PaddlePaddle/Paddle/pull/62899),[#60528](https://github.com/PaddlePaddle/Paddle/pull/60528),[#63158](https://github.com/PaddlePaddle/Paddle/pull/63158),[#64549](https://github.com/PaddlePaddle/Paddle/pull/64549),[#62351](https://github.com/PaddlePaddle/Paddle/pull/62351),[#61259](https://github.com/PaddlePaddle/Paddle/pull/61259),[#61281](https://github.com/PaddlePaddle/Paddle/pull/61281),[#62304](https://github.com/PaddlePaddle/Paddle/pull/62304),[#60736](https://github.com/PaddlePaddle/Paddle/pull/60736),[#60811](https://github.com/PaddlePaddle/Paddle/pull/60811),[#63949](https://github.com/PaddlePaddle/Paddle/pull/63949),[#59892](https://github.com/PaddlePaddle/Paddle/pull/59892),[#60767](https://github.com/PaddlePaddle/Paddle/pull/60767),[#60856](https://github.com/PaddlePaddle/Paddle/pull/60856),[#61286](https://github.com/PaddlePaddle/Paddle/pull/61286),[#61638](https://github.com/PaddlePaddle/Paddle/pull/61638),[#62079](https://github.com/PaddlePaddle/Paddle/pull/62079),[#62142](https://github.com/PaddlePaddle/Paddle/pull/62142),[#62823](https://github.com/PaddlePaddle/Paddle/pull/62823),[#62814](https://github.com/PaddlePaddle/Paddle/pull/62814),[#62425](https://github.com/PaddlePaddle/Paddle/pull/62425),[#62619](https://github.com/PaddlePaddle/Paddle/pull/62619),[#60207](https://github.com/PaddlePaddle/Paddle/pull/60207),[#60765](https://github.com/PaddlePaddle/Paddle/pull/60765),[#61870](https://github.com/PaddlePaddle/Paddle/pull/61870),[#61923](https://github.com/PaddlePaddle/Paddle/pull/61923),[#62144](https://github.com/PaddlePaddle/Paddle/pull/62144),[#62426](https://github.com/PaddlePaddle/Paddle/pull/62426),[#63848](https://github.com/PaddlePaddle/Paddle/pull/63848),[#60682](https://github.com/PaddlePaddle/Paddle/pull/60682),[#61369](https://github.com/PaddlePaddle/Paddle/pull/61369),[#62882](https://github.com/PaddlePaddle/Paddle/pull/62882),[#63944](https://github.com/PaddlePaddle/Paddle/pull/63944),[#64812](https://github.com/PaddlePaddle/Paddle/pull/64812),[#60654](https://github.com/PaddlePaddle/Paddle/pull/60654),[#60887](https://github.com/PaddlePaddle/Paddle/pull/60887),[#62058](https://github.com/PaddlePaddle/Paddle/pull/62058),[#64639](https://github.com/PaddlePaddle/Paddle/pull/64639),[#60115](https://github.com/PaddlePaddle/Paddle/pull/60115),[#61940](https://github.com/PaddlePaddle/Paddle/pull/61940),[#62614](https://github.com/PaddlePaddle/Paddle/pull/62614),[#59914](https://github.com/PaddlePaddle/Paddle/pull/59914),[#63762](https://github.com/PaddlePaddle/Paddle/pull/63762),[#60145](https://github.com/PaddlePaddle/Paddle/pull/60145),[#60285](https://github.com/PaddlePaddle/Paddle/pull/60285),[#60378](https://github.com/PaddlePaddle/Paddle/pull/60378),[#60393](https://github.com/PaddlePaddle/Paddle/pull/60393),[#61057](https://github.com/PaddlePaddle/Paddle/pull/61057),[#61058](https://github.com/PaddlePaddle/Paddle/pull/61058),[#61151](https://github.com/PaddlePaddle/Paddle/pull/61151),[#61347](https://github.com/PaddlePaddle/Paddle/pull/61347),[#61554](https://github.com/PaddlePaddle/Paddle/pull/61554),[#61844](https://github.com/PaddlePaddle/Paddle/pull/61844),[#62915](https://github.com/PaddlePaddle/Paddle/pull/62915),[#61852](https://github.com/PaddlePaddle/Paddle/pull/61852),[#61704](https://github.com/PaddlePaddle/Paddle/pull/61704),[#61991](https://github.com/PaddlePaddle/Paddle/pull/61991),[#62264](https://github.com/PaddlePaddle/Paddle/pull/62264),[#62762](https://github.com/PaddlePaddle/Paddle/pull/62762),[#63820](https://github.com/PaddlePaddle/Paddle/pull/63820),[#63864](https://github.com/PaddlePaddle/Paddle/pull/63864),[#65017](https://github.com/PaddlePaddle/Paddle/pull/65017),[#61183](https://github.com/PaddlePaddle/Paddle/pull/61183),[#59866](https://github.com/PaddlePaddle/Paddle/pull/59866),[#61171](https://github.com/PaddlePaddle/Paddle/pull/61171),[#61290](https://github.com/PaddlePaddle/Paddle/pull/61290),[#61725](https://github.com/PaddlePaddle/Paddle/pull/61725),[#61614](https://github.com/PaddlePaddle/Paddle/pull/61614),[#61721](https://github.com/PaddlePaddle/Paddle/pull/61721),[#61494](https://github.com/PaddlePaddle/Paddle/pull/61494),[#61556](https://github.com/PaddlePaddle/Paddle/pull/61556),[#61689](https://github.com/PaddlePaddle/Paddle/pull/61689) -## 11.文档相关的问题修复 -- 随着 API 功能增强工作的开展,对部分 API 文档也同步进行了修正和增强。[#62875](https://github.com/PaddlePaddle/Paddle/pull/62875), [#59793](https://github.com/PaddlePaddle/Paddle/pull/59793), [#60002](https://github.com/PaddlePaddle/Paddle/pull/60002), [#59985](https://github.com/PaddlePaddle/Paddle/pull/59985), [#63365](https://github.com/PaddlePaddle/Paddle/pull/63365), [#60962](https://github.com/PaddlePaddle/Paddle/pull/60962), [#60942](https://github.com/PaddlePaddle/Paddle/pull/60942), [#64232](https://github.com/PaddlePaddle/Paddle/pull/64232), [#63255](https://github.com/PaddlePaddle/Paddle/pull/63255) -- 更新/补充 API 文档。bernoulli_ ([#64504](https://github.com/PaddlePaddle/Paddle/pull/64504)),paddle.static.ctr_metric_bundle ([#60912](https://github.com/PaddlePaddle/Paddle/pull/60912)),LayerNorm ([#62928](https://github.com/PaddlePaddle/Paddle/pull/62928)),Sequential ([#63128](https://github.com/PaddlePaddle/Paddle/pull/63128)),paddle.summary ([#63121](https://github.com/PaddlePaddle/Paddle/pull/63121)),AutoParallel 中的 ShardOptimizer ([#62933](https://github.com/PaddlePaddle/Paddle/pull/62933)),paddle.nccl.version ([#62480](https://github.com/PaddlePaddle/Paddle/pull/62480)) -- 更新 Readme 文件。[#59883](https://github.com/PaddlePaddle/Paddle/pull/59883),[#60691](https://github.com/PaddlePaddle/Paddle/pull/60691),[#60749](https://github.com/PaddlePaddle/Paddle/pull/60749) -- 将 mkldnn 更新为 onednn。[#63199](https://github.com/PaddlePaddle/Paddle/pull/63199),[#63202](https://github.com/PaddlePaddle/Paddle/pull/63202),[#63215](https://github.com/PaddlePaddle/Paddle/pull/63215),[#63209](https://github.com/PaddlePaddle/Paddle/pull/63209) -- 修复文档渲染错误。[#59725](https://github.com/PaddlePaddle/Paddle/pull/59725),[#60306](https://github.com/PaddlePaddle/Paddle/pull/60306) -- 修改了代码中大量的错别字,增强源码可读性。[#60093](https://github.com/PaddlePaddle/Paddle/pull/60093),[#60603](https://github.com/PaddlePaddle/Paddle/pull/60603),[#60631](https://github.com/PaddlePaddle/Paddle/pull/60631),[#60679](https://github.com/PaddlePaddle/Paddle/pull/60679),[#60741](https://github.com/PaddlePaddle/Paddle/pull/60741),[#60770](https://github.com/PaddlePaddle/Paddle/pull/60770),[#60784](https://github.com/PaddlePaddle/Paddle/pull/60784),[#60825](https://github.com/PaddlePaddle/Paddle/pull/60825),[#60857](https://github.com/PaddlePaddle/Paddle/pull/60857),[#60891](https://github.com/PaddlePaddle/Paddle/pull/60891),[#60921](https://github.com/PaddlePaddle/Paddle/pull/60921),[#60920](https://github.com/PaddlePaddle/Paddle/pull/60920),[#60923](https://github.com/PaddlePaddle/Paddle/pull/60923),[#60928](https://github.com/PaddlePaddle/Paddle/pull/60928),[#60940](https://github.com/PaddlePaddle/Paddle/pull/60940),[#60936](https://github.com/PaddlePaddle/Paddle/pull/60936),[#60932](https://github.com/PaddlePaddle/Paddle/pull/60932),[#60935](https://github.com/PaddlePaddle/Paddle/pull/60935),[#60931](https://github.com/PaddlePaddle/Paddle/pull/60931),[#60951](https://github.com/PaddlePaddle/Paddle/pull/60951),[#60964](https://github.com/PaddlePaddle/Paddle/pull/60964),[#60965](https://github.com/PaddlePaddle/Paddle/pull/60965),[#60967](https://github.com/PaddlePaddle/Paddle/pull/60967),[#60972](https://github.com/PaddlePaddle/Paddle/pull/60972),[#60971](https://github.com/PaddlePaddle/Paddle/pull/60971),[#60980](https://github.com/PaddlePaddle/Paddle/pull/60980),[#60984](https://github.com/PaddlePaddle/Paddle/pull/60984),[#60985](https://github.com/PaddlePaddle/Paddle/pull/60985),[#60989](https://github.com/PaddlePaddle/Paddle/pull/60989),[#60990](https://github.com/PaddlePaddle/Paddle/pull/60990),[#60991](https://github.com/PaddlePaddle/Paddle/pull/60991),[#60992](https://github.com/PaddlePaddle/Paddle/pull/60992),[#60994](https://github.com/PaddlePaddle/Paddle/pull/60994),[#60995](https://github.com/PaddlePaddle/Paddle/pull/60995),[#60996](https://github.com/PaddlePaddle/Paddle/pull/60996),[#61001](https://github.com/PaddlePaddle/Paddle/pull/61001),[#61000](https://github.com/PaddlePaddle/Paddle/pull/61000),[#60999](https://github.com/PaddlePaddle/Paddle/pull/60999),[#60998](https://github.com/PaddlePaddle/Paddle/pull/60998),[#61026](https://github.com/PaddlePaddle/Paddle/pull/61026),[#61009](https://github.com/PaddlePaddle/Paddle/pull/61009),[#61034](https://github.com/PaddlePaddle/Paddle/pull/61034),[#61033](https://github.com/PaddlePaddle/Paddle/pull/61033),[#61020](https://github.com/PaddlePaddle/Paddle/pull/61020),[#61092](https://github.com/PaddlePaddle/Paddle/pull/61092),[#61066](https://github.com/PaddlePaddle/Paddle/pull/61066),[#61063](https://github.com/PaddlePaddle/Paddle/pull/61063),[#61089](https://github.com/PaddlePaddle/Paddle/pull/61089),[#61071](https://github.com/PaddlePaddle/Paddle/pull/61071),[#61129](https://github.com/PaddlePaddle/Paddle/pull/61129),[#61128](https://github.com/PaddlePaddle/Paddle/pull/61128),[#61126](https://github.com/PaddlePaddle/Paddle/pull/61126),[#61123](https://github.com/PaddlePaddle/Paddle/pull/61123),[#61113](https://github.com/PaddlePaddle/Paddle/pull/61113),[#61189](https://github.com/PaddlePaddle/Paddle/pull/61189),[#61175](https://github.com/PaddlePaddle/Paddle/pull/61175),[#61153](https://github.com/PaddlePaddle/Paddle/pull/61153),[#61198](https://github.com/PaddlePaddle/Paddle/pull/61198),[#61206](https://github.com/PaddlePaddle/Paddle/pull/61206),[#61256](https://github.com/PaddlePaddle/Paddle/pull/61256),[#61255](https://github.com/PaddlePaddle/Paddle/pull/61255),[#61251](https://github.com/PaddlePaddle/Paddle/pull/61251),[#61246](https://github.com/PaddlePaddle/Paddle/pull/61246),[#61245](https://github.com/PaddlePaddle/Paddle/pull/61245),[#61231](https://github.com/PaddlePaddle/Paddle/pull/61231),[#61247](https://github.com/PaddlePaddle/Paddle/pull/61247),[#61265](https://github.com/PaddlePaddle/Paddle/pull/61265),[#61264](https://github.com/PaddlePaddle/Paddle/pull/61264),[#61266](https://github.com/PaddlePaddle/Paddle/pull/61266),[#61267](https://github.com/PaddlePaddle/Paddle/pull/61267),[#61268](https://github.com/PaddlePaddle/Paddle/pull/61268),[#61270](https://github.com/PaddlePaddle/Paddle/pull/61270),[#61334](https://github.com/PaddlePaddle/Paddle/pull/61334),[#61392](https://github.com/PaddlePaddle/Paddle/pull/61392),[#61404](https://github.com/PaddlePaddle/Paddle/pull/61404),[#61318](https://github.com/PaddlePaddle/Paddle/pull/61318),[#61383](https://github.com/PaddlePaddle/Paddle/pull/61383),[#61306](https://github.com/PaddlePaddle/Paddle/pull/61306),[#61324](https://github.com/PaddlePaddle/Paddle/pull/61324),[#61426](https://github.com/PaddlePaddle/Paddle/pull/61426),[#61390](https://github.com/PaddlePaddle/Paddle/pull/61390),[#61419](https://github.com/PaddlePaddle/Paddle/pull/61419),[#61420](https://github.com/PaddlePaddle/Paddle/pull/61420),[#61408](https://github.com/PaddlePaddle/Paddle/pull/61408),[#61425](https://github.com/PaddlePaddle/Paddle/pull/61425),[#61557](https://github.com/PaddlePaddle/Paddle/pull/61557),[#61628](https://github.com/PaddlePaddle/Paddle/pull/61628),[#61652](https://github.com/PaddlePaddle/Paddle/pull/61652),[#61602](https://github.com/PaddlePaddle/Paddle/pull/61602),[#61558](https://github.com/PaddlePaddle/Paddle/pull/61558),[#61660](https://github.com/PaddlePaddle/Paddle/pull/61660),[#61423](https://github.com/PaddlePaddle/Paddle/pull/61423),[#61627](https://github.com/PaddlePaddle/Paddle/pull/61627),[#61685](https://github.com/PaddlePaddle/Paddle/pull/61685),[#61690](https://github.com/PaddlePaddle/Paddle/pull/61690),[#61727](https://github.com/PaddlePaddle/Paddle/pull/61727),[#61738](https://github.com/PaddlePaddle/Paddle/pull/61738),[#61740](https://github.com/PaddlePaddle/Paddle/pull/61740),[#61741](https://github.com/PaddlePaddle/Paddle/pull/61741),[#61743](https://github.com/PaddlePaddle/Paddle/pull/61743),[#61744](https://github.com/PaddlePaddle/Paddle/pull/61744),[#61745](https://github.com/PaddlePaddle/Paddle/pull/61745),[#61761](https://github.com/PaddlePaddle/Paddle/pull/61761),[#61762](https://github.com/PaddlePaddle/Paddle/pull/61762),[#61764](https://github.com/PaddlePaddle/Paddle/pull/61764),[#61767](https://github.com/PaddlePaddle/Paddle/pull/61767),[#61768](https://github.com/PaddlePaddle/Paddle/pull/61768),[#61774](https://github.com/PaddlePaddle/Paddle/pull/61774),[#61781](https://github.com/PaddlePaddle/Paddle/pull/61781),[#61783](https://github.com/PaddlePaddle/Paddle/pull/61783),[#61757](https://github.com/PaddlePaddle/Paddle/pull/61757),[#61732](https://github.com/PaddlePaddle/Paddle/pull/61732),[#61776](https://github.com/PaddlePaddle/Paddle/pull/61776),[#61780](https://github.com/PaddlePaddle/Paddle/pull/61780),[#61730](https://github.com/PaddlePaddle/Paddle/pull/61730),[#61728](https://github.com/PaddlePaddle/Paddle/pull/61728),[#61633](https://github.com/PaddlePaddle/Paddle/pull/61633),[#61720](https://github.com/PaddlePaddle/Paddle/pull/61720),[#61734](https://github.com/PaddlePaddle/Paddle/pull/61734),[#61779](https://github.com/PaddlePaddle/Paddle/pull/61779),[#61775](https://github.com/PaddlePaddle/Paddle/pull/61775),[#61773](https://github.com/PaddlePaddle/Paddle/pull/61773),[#61787](https://github.com/PaddlePaddle/Paddle/pull/61787),[#61687](https://github.com/PaddlePaddle/Paddle/pull/61687),[#61747](https://github.com/PaddlePaddle/Paddle/pull/61747),[#61760](https://github.com/PaddlePaddle/Paddle/pull/61760),[#61782](https://github.com/PaddlePaddle/Paddle/pull/61782),[#61800](https://github.com/PaddlePaddle/Paddle/pull/61800),[#61748](https://github.com/PaddlePaddle/Paddle/pull/61748),[#61772](https://github.com/PaddlePaddle/Paddle/pull/61772),[#61786](https://github.com/PaddlePaddle/Paddle/pull/61786),[#61880](https://github.com/PaddlePaddle/Paddle/pull/61880),[#61718](https://github.com/PaddlePaddle/Paddle/pull/61718),[#61742](https://github.com/PaddlePaddle/Paddle/pull/61742),[#61766](https://github.com/PaddlePaddle/Paddle/pull/61766),[#61835](https://github.com/PaddlePaddle/Paddle/pull/61835),[#61838](https://github.com/PaddlePaddle/Paddle/pull/61838),[#61754](https://github.com/PaddlePaddle/Paddle/pull/61754),[#61833](https://github.com/PaddlePaddle/Paddle/pull/61833),[#61749](https://github.com/PaddlePaddle/Paddle/pull/61749),[#61938](https://github.com/PaddlePaddle/Paddle/pull/61938),[#61919](https://github.com/PaddlePaddle/Paddle/pull/61919),[#61924](https://github.com/PaddlePaddle/Paddle/pull/61924),[#61778](https://github.com/PaddlePaddle/Paddle/pull/61778),[#61839](https://github.com/PaddlePaddle/Paddle/pull/61839),[#61879](https://github.com/PaddlePaddle/Paddle/pull/61879),[#61929](https://github.com/PaddlePaddle/Paddle/pull/61929),[#61801](https://github.com/PaddlePaddle/Paddle/pull/61801),[#61788](https://github.com/PaddlePaddle/Paddle/pull/61788),[#61999](https://github.com/PaddlePaddle/Paddle/pull/61999),[#61928](https://github.com/PaddlePaddle/Paddle/pull/61928),[#61958](https://github.com/PaddlePaddle/Paddle/pull/61958),[#61982](https://github.com/PaddlePaddle/Paddle/pull/61982),[#61996](https://github.com/PaddlePaddle/Paddle/pull/61996),[#61953](https://github.com/PaddlePaddle/Paddle/pull/61953),[#61998](https://github.com/PaddlePaddle/Paddle/pull/61998),[#62003](https://github.com/PaddlePaddle/Paddle/pull/62003),[#61921](https://github.com/PaddlePaddle/Paddle/pull/61921),[#61881](https://github.com/PaddlePaddle/Paddle/pull/61881),[#61746](https://github.com/PaddlePaddle/Paddle/pull/61746),[#61955](https://github.com/PaddlePaddle/Paddle/pull/61955),[#62002](https://github.com/PaddlePaddle/Paddle/pull/62002),[#62001](https://github.com/PaddlePaddle/Paddle/pull/62001),[#61997](https://github.com/PaddlePaddle/Paddle/pull/61997),[#61765](https://github.com/PaddlePaddle/Paddle/pull/61765),[#61956](https://github.com/PaddlePaddle/Paddle/pull/61956),[#62004](https://github.com/PaddlePaddle/Paddle/pull/62004),[#62044](https://github.com/PaddlePaddle/Paddle/pull/62044),[#62040](https://github.com/PaddlePaddle/Paddle/pull/62040),[#62043](https://github.com/PaddlePaddle/Paddle/pull/62043),[#62042](https://github.com/PaddlePaddle/Paddle/pull/62042),[#62041](https://github.com/PaddlePaddle/Paddle/pull/62041),[#62039](https://github.com/PaddlePaddle/Paddle/pull/62039),[#62019](https://github.com/PaddlePaddle/Paddle/pull/62019),[#61910](https://github.com/PaddlePaddle/Paddle/pull/61910),[#61882](https://github.com/PaddlePaddle/Paddle/pull/61882),[#61836](https://github.com/PaddlePaddle/Paddle/pull/61836),[#62013](https://github.com/PaddlePaddle/Paddle/pull/62013),[#62055](https://github.com/PaddlePaddle/Paddle/pull/62055),[#62047](https://github.com/PaddlePaddle/Paddle/pull/62047),[#62000](https://github.com/PaddlePaddle/Paddle/pull/62000),[#62048](https://github.com/PaddlePaddle/Paddle/pull/62048),[#62075](https://github.com/PaddlePaddle/Paddle/pull/62075),[#62038](https://github.com/PaddlePaddle/Paddle/pull/62038),[#62045](https://github.com/PaddlePaddle/Paddle/pull/62045),[#62105](https://github.com/PaddlePaddle/Paddle/pull/62105),[#62214](https://github.com/PaddlePaddle/Paddle/pull/62214),[#62212](https://github.com/PaddlePaddle/Paddle/pull/62212),[#62183](https://github.com/PaddlePaddle/Paddle/pull/62183),[#62182](https://github.com/PaddlePaddle/Paddle/pull/62182),[#62181](https://github.com/PaddlePaddle/Paddle/pull/62181),[#62179](https://github.com/PaddlePaddle/Paddle/pull/62179),[#62178](https://github.com/PaddlePaddle/Paddle/pull/62178),[#62172](https://github.com/PaddlePaddle/Paddle/pull/62172),[#62168](https://github.com/PaddlePaddle/Paddle/pull/62168),[#62163](https://github.com/PaddlePaddle/Paddle/pull/62163),[#62162](https://github.com/PaddlePaddle/Paddle/pull/62162),[#62161](https://github.com/PaddlePaddle/Paddle/pull/62161),[#62160](https://github.com/PaddlePaddle/Paddle/pull/62160),[#62046](https://github.com/PaddlePaddle/Paddle/pull/62046),[#62175](https://github.com/PaddlePaddle/Paddle/pull/62175),[#62259](https://github.com/PaddlePaddle/Paddle/pull/62259),[#62258](https://github.com/PaddlePaddle/Paddle/pull/62258),[#62213](https://github.com/PaddlePaddle/Paddle/pull/62213),[#62260](https://github.com/PaddlePaddle/Paddle/pull/62260),[#62290](https://github.com/PaddlePaddle/Paddle/pull/62290),[#62288](https://github.com/PaddlePaddle/Paddle/pull/62288),[#62323](https://github.com/PaddlePaddle/Paddle/pull/62323),[#62319](https://github.com/PaddlePaddle/Paddle/pull/62319),[#62331](https://github.com/PaddlePaddle/Paddle/pull/62331),[#62330](https://github.com/PaddlePaddle/Paddle/pull/62330),[#62329](https://github.com/PaddlePaddle/Paddle/pull/62329),[#62324](https://github.com/PaddlePaddle/Paddle/pull/62324),[#62317](https://github.com/PaddlePaddle/Paddle/pull/62317),[#62311](https://github.com/PaddlePaddle/Paddle/pull/62311),[#62310](https://github.com/PaddlePaddle/Paddle/pull/62310),[#62308](https://github.com/PaddlePaddle/Paddle/pull/62308),[#62289](https://github.com/PaddlePaddle/Paddle/pull/62289),[#62307](https://github.com/PaddlePaddle/Paddle/pull/62307),[#62315](https://github.com/PaddlePaddle/Paddle/pull/62315),[#62406](https://github.com/PaddlePaddle/Paddle/pull/62406),[#62458](https://github.com/PaddlePaddle/Paddle/pull/62458),[#62459](https://github.com/PaddlePaddle/Paddle/pull/62459),[#62481](https://github.com/PaddlePaddle/Paddle/pull/62481),[#62465](https://github.com/PaddlePaddle/Paddle/pull/62465),[#62462](https://github.com/PaddlePaddle/Paddle/pull/62462),[#62453](https://github.com/PaddlePaddle/Paddle/pull/62453),[#62496](https://github.com/PaddlePaddle/Paddle/pull/62496),[#62457](https://github.com/PaddlePaddle/Paddle/pull/62457),[#62537](https://github.com/PaddlePaddle/Paddle/pull/62537),[#62514](https://github.com/PaddlePaddle/Paddle/pull/62514),[#62548](https://github.com/PaddlePaddle/Paddle/pull/62548),[#62544](https://github.com/PaddlePaddle/Paddle/pull/62544),[#62575](https://github.com/PaddlePaddle/Paddle/pull/62575),[#62463](https://github.com/PaddlePaddle/Paddle/pull/62463),[#62643](https://github.com/PaddlePaddle/Paddle/pull/62643),[#62803](https://github.com/PaddlePaddle/Paddle/pull/62803),[#62924](https://github.com/PaddlePaddle/Paddle/pull/62924),[#63037](https://github.com/PaddlePaddle/Paddle/pull/63037),[#63102](https://github.com/PaddlePaddle/Paddle/pull/63102),[#63139](https://github.com/PaddlePaddle/Paddle/pull/63139),[#63092](https://github.com/PaddlePaddle/Paddle/pull/63092),[#63147](https://github.com/PaddlePaddle/Paddle/pull/63147),[#60518](https://github.com/PaddlePaddle/Paddle/pull/60518),[#60485](https://github.com/PaddlePaddle/Paddle/pull/60485),[#61273](https://github.com/PaddlePaddle/Paddle/pull/61273),[#63429](https://github.com/PaddlePaddle/Paddle/pull/63429),[#61954](https://github.com/PaddlePaddle/Paddle/pull/61954) +昆仑芯 XPU 上进行 OP 的 Bug 修复 +[#65020](https://github.com/PaddlePaddle/Paddle/pull/65020), [#65251](https://github.com/PaddlePaddle/Paddle/pull/65251), [#65418](https://github.com/PaddlePaddle/Paddle/pull/65418), [#65387](https://github.com/PaddlePaddle/Paddle/pull/65387), [#65525](https://github.com/PaddlePaddle/Paddle/pull/65525), [#65613](https://github.com/PaddlePaddle/Paddle/pull/65613), [#65533](https://github.com/PaddlePaddle/Paddle/pull/65533), [#65705](https://github.com/PaddlePaddle/Paddle/pull/65705), [#65915](https://github.com/PaddlePaddle/Paddle/pull/65915), [#66238](https://github.com/PaddlePaddle/Paddle/pull/66238), [#66485](https://github.com/PaddlePaddle/Paddle/pull/66485), [#67349](https://github.com/PaddlePaddle/Paddle/pull/67349), [#67372](https://github.com/PaddlePaddle/Paddle/pull/67372), [#67276](https://github.com/PaddlePaddle/Paddle/pull/67276), [#67460](https://github.com/PaddlePaddle/Paddle/pull/67460), [#67496](https://github.com/PaddlePaddle/Paddle/pull/67496), [#67530](https://github.com/PaddlePaddle/Paddle/pull/67530), [#67828](https://github.com/PaddlePaddle/Paddle/pull/67828), [#68010](https://github.com/PaddlePaddle/Paddle/pull/68010), [#68157](https://github.com/PaddlePaddle/Paddle/pull/68157), [#68172](https://github.com/PaddlePaddle/Paddle/pull/68172), [#68388](https://github.com/PaddlePaddle/Paddle/pull/68388), [#68213](https://github.com/PaddlePaddle/Paddle/pull/68213), [#68501](https://github.com/PaddlePaddle/Paddle/pull/68501), [#68504](https://github.com/PaddlePaddle/Paddle/pull/68504), [#68585](https://github.com/PaddlePaddle/Paddle/pull/68585), [#69229](https://github.com/PaddlePaddle/Paddle/pull/69229), [#69374](https://github.com/PaddlePaddle/Paddle/pull/69374), [#69424](https://github.com/PaddlePaddle/Paddle/pull/69424), [#69440](https://github.com/PaddlePaddle/Paddle/pull/69440), [#69614](https://github.com/PaddlePaddle/Paddle/pull/69614), [#68542](https://github.com/PaddlePaddle/Paddle/pull/68542), [#69990](https://github.com/PaddlePaddle/Paddle/pull/69990), [#70351](https://github.com/PaddlePaddle/Paddle/pull/70351), [#70479](https://github.com/PaddlePaddle/Paddle/pull/70479), [#70431](https://github.com/PaddlePaddle/Paddle/pull/70431), [#70638](https://github.com/PaddlePaddle/Paddle/pull/70638), [#70856](https://github.com/PaddlePaddle/Paddle/pull/70856), [#70974](https://github.com/PaddlePaddle/Paddle/pull/70974), [#70973](https://github.com/PaddlePaddle/Paddle/pull/70973), [#71027](https://github.com/PaddlePaddle/Paddle/pull/71027), [#71062](https://github.com/PaddlePaddle/Paddle/pull/71062), [#71115](https://github.com/PaddlePaddle/Paddle/pull/71115), [#71110](https://github.com/PaddlePaddle/Paddle/pull/71110), [#70858](https://github.com/PaddlePaddle/Paddle/pull/70858), [#71147](https://github.com/PaddlePaddle/Paddle/pull/71147), [#71212](https://github.com/PaddlePaddle/Paddle/pull/71212), [#71361](https://github.com/PaddlePaddle/Paddle/pull/71361), [#71423](https://github.com/PaddlePaddle/Paddle/pull/71423), [#70859](https://github.com/PaddlePaddle/Paddle/pull/70859), [#71492](https://github.com/PaddlePaddle/Paddle/pull/71492), [#71493](https://github.com/PaddlePaddle/Paddle/pull/71493), [#69826](https://github.com/PaddlePaddle/Paddle/pull/69826), [#67341](https://github.com/PaddlePaddle/Paddle/pull/67341), [#68906](https://github.com/PaddlePaddle/Paddle/pull/68906), [#71171](https://github.com/PaddlePaddle/Paddle/pull/71171) + +海光 DCU 上进行 OP 的 Bug 修复 +[#69617](https://github.com/PaddlePaddle/Paddle/pull/69617), [#65716](https://github.com/PaddlePaddle/Paddle/pull/65716), [#66630](https://github.com/PaddlePaddle/Paddle/pull/66630), [#65399](https://github.com/PaddlePaddle/Paddle/pull/65399) + +### 性能优化 + +昆仑芯 XPU 对 stream 等基础组件功能升级、对部分 op 的性能进行优化。 +[#65102](https://github.com/PaddlePaddle/Paddle/pull/65102), [#69727](https://github.com/PaddlePaddle/Paddle/pull/69727), [#69899](https://github.com/PaddlePaddle/Paddle/pull/69899), [#69942](https://github.com/PaddlePaddle/Paddle/pull/69942), [#70025](https://github.com/PaddlePaddle/Paddle/pull/70025), [#70640](https://github.com/PaddlePaddle/Paddle/pull/70640) + +### 硬件底层基础库升级 + +基础库的升级支持昆仑芯 P800,以及基础组件的支持 +[#65494](https://github.com/PaddlePaddle/Paddle/pull/65494), [#65924](https://github.com/PaddlePaddle/Paddle/pull/65924), [#69752](https://github.com/PaddlePaddle/Paddle/pull/69752), [#70835](https://github.com/PaddlePaddle/Paddle/pull/70835), [#65554](https://github.com/PaddlePaddle/Paddle/pull/65554), [#66998](https://github.com/PaddlePaddle/Paddle/pull/66998), [#65278](https://github.com/PaddlePaddle/Paddle/pull/65278), [#70614](https://github.com/PaddlePaddle/Paddle/pull/70614), [#71012](https://github.com/PaddlePaddle/Paddle/pull/71012), [#71178](https://github.com/PaddlePaddle/Paddle/pull/71178), [#71168](https://github.com/PaddlePaddle/Paddle/pull/71168), [#68740](https://github.com/PaddlePaddle/Paddle/pull/68740), [#71100](https://github.com/PaddlePaddle/Paddle/pull/71100), [#65221](https://github.com/PaddlePaddle/Paddle/pull/65221), [#67983](https://github.com/PaddlePaddle/Paddle/pull/67983) + +### 其他 + +op test 等相关模块修改 +[#65654](https://github.com/PaddlePaddle/Paddle/pull/65654), [#66233](https://github.com/PaddlePaddle/Paddle/pull/66233), [#66728](https://github.com/PaddlePaddle/Paddle/pull/66728), [#67959](https://github.com/PaddlePaddle/Paddle/pull/67959), [#68169](https://github.com/PaddlePaddle/Paddle/pull/68169), [#68418](https://github.com/PaddlePaddle/Paddle/pull/68418), [#68434](https://github.com/PaddlePaddle/Paddle/pull/68434), [#68445](https://github.com/PaddlePaddle/Paddle/pull/68445), [#68877](https://github.com/PaddlePaddle/Paddle/pull/68877), [#68993](https://github.com/PaddlePaddle/Paddle/pull/68993), [#69006](https://github.com/PaddlePaddle/Paddle/pull/69006), [#70471](https://github.com/PaddlePaddle/Paddle/pull/70471), [#70706](https://github.com/PaddlePaddle/Paddle/pull/70706), [#67777](https://github.com/PaddlePaddle/Paddle/pull/67777), [#65698](https://github.com/PaddlePaddle/Paddle/pull/65698), [#68433](https://github.com/PaddlePaddle/Paddle/pull/68433), [#65689](https://github.com/PaddlePaddle/Paddle/pull/65689) + +## 9. 环境更新 + +- 优化了框架的稳定性和跨平台兼容性,修复了测试覆盖率及编译环境兼容性问题,并增强对 Windows/XPU/DCU 等多平台支持;同时精简了代码结构,移除废弃代码和无用依赖库以降低维护成本;升级 CUDA 等关键依赖,进一步优化 CI/CD 流程,提升构建速度并增强系统整体稳定性。 + +### Bug 修复 + +- 完善 CI/CD 流程并修复测试用例、解决不同环境下的编译安装问题, 提升框架稳定性和跨环境兼容性。 + [#65627](https://github.com/PaddlePaddle/Paddle/pull/65627), [#65736](https://github.com/PaddlePaddle/Paddle/pull/65736), [#65900](https://github.com/PaddlePaddle/Paddle/pull/65900), [#66069](https://github.com/PaddlePaddle/Paddle/pull/66069), [#67000](https://github.com/PaddlePaddle/Paddle/pull/67000), [#67312](https://github.com/PaddlePaddle/Paddle/pull/67312), [#67432](https://github.com/PaddlePaddle/Paddle/pull/67432), [#67540](https://github.com/PaddlePaddle/Paddle/pull/67540), [#67670](https://github.com/PaddlePaddle/Paddle/pull/67670), [#68449](https://github.com/PaddlePaddle/Paddle/pull/68449), [#70806](https://github.com/PaddlePaddle/Paddle/pull/70806), [#65665](https://github.com/PaddlePaddle/Paddle/pull/65665), [#65652](https://github.com/PaddlePaddle/Paddle/pull/65652), [#70644](https://github.com/PaddlePaddle/Paddle/pull/70644), [#68119](https://github.com/PaddlePaddle/Paddle/pull/68119), [#68466](https://github.com/PaddlePaddle/Paddle/pull/68466), [#68858](https://github.com/PaddlePaddle/Paddle/pull/68858), [#68788](https://github.com/PaddlePaddle/Paddle/pull/68788), [#68934](https://github.com/PaddlePaddle/Paddle/pull/68934), [#69883](https://github.com/PaddlePaddle/Paddle/pull/69883), [#69924](https://github.com/PaddlePaddle/Paddle/pull/69924), [#71187](https://github.com/PaddlePaddle/Paddle/pull/71187), [#70798](https://github.com/PaddlePaddle/Paddle/pull/70798), [#71248](https://github.com/PaddlePaddle/Paddle/pull/71248), [#70512](https://github.com/PaddlePaddle/Paddle/pull/70512), [#71363](https://github.com/PaddlePaddle/Paddle/pull/71363), [#71438](https://github.com/PaddlePaddle/Paddle/pull/71438), [#71291](https://github.com/PaddlePaddle/Paddle/pull/71291) + +### 改进升级 + +- 环境升级 + [#69491](https://github.com/PaddlePaddle/Paddle/pull/69491), [#66560](https://github.com/PaddlePaddle/Paddle/pull/66560), [#65686](https://github.com/PaddlePaddle/Paddle/pull/65686), [#71177](https://github.com/PaddlePaddle/Paddle/pull/71177), [#71284](https://github.com/PaddlePaddle/Paddle/pull/71284), [#69791](https://github.com/PaddlePaddle/Paddle/pull/69791), [#69349](https://github.com/PaddlePaddle/Paddle/pull/69349), [#70944](https://github.com/PaddlePaddle/Paddle/pull/70944), [#65411](https://github.com/PaddlePaddle/Paddle/pull/65411) +- 流水线合并 + [#66815](https://github.com/PaddlePaddle/Paddle/pull/66815), [#67306](https://github.com/PaddlePaddle/Paddle/pull/67306) +- DCU/NPU/KUNLUN 流水线完善 + [#67516](https://github.com/PaddlePaddle/Paddle/pull/67516), [#67629](https://github.com/PaddlePaddle/Paddle/pull/67629), [#67987](https://github.com/PaddlePaddle/Paddle/pull/67987), [#69903](https://github.com/PaddlePaddle/Paddle/pull/69903), [#68448](https://github.com/PaddlePaddle/Paddle/pull/68448), [#70401](https://github.com/PaddlePaddle/Paddle/pull/70401), [#71192](https://github.com/PaddlePaddle/Paddle/pull/71192), [#71197](https://github.com/PaddlePaddle/Paddle/pull/71197), [#68027](https://github.com/PaddlePaddle/Paddle/pull/68027) +- Windows 环境支持 + [#70390](https://github.com/PaddlePaddle/Paddle/pull/70390), [#70785](https://github.com/PaddlePaddle/Paddle/pull/70785), [#71286](https://github.com/PaddlePaddle/Paddle/pull/71286), [#71414](https://github.com/PaddlePaddle/Paddle/pull/71414), [#68901](https://github.com/PaddlePaddle/Paddle/pull/68901) +- 第三方库完善 + [#71419](https://github.com/PaddlePaddle/Paddle/pull/71419) +- 其他优化用于提升 CI 稳定性和执行效率 + [#67574](https://github.com/PaddlePaddle/Paddle/pull/67574), [#69058](https://github.com/PaddlePaddle/Paddle/pull/69058), [#70610](https://github.com/PaddlePaddle/Paddle/pull/70610), [#67093](https://github.com/PaddlePaddle/Paddle/pull/67093), [#69037](https://github.com/PaddlePaddle/Paddle/pull/69037), [#65213](https://github.com/PaddlePaddle/Paddle/pull/65213), [#65913](https://github.com/PaddlePaddle/Paddle/pull/65913), [#65947](https://github.com/PaddlePaddle/Paddle/pull/65947), [#66479](https://github.com/PaddlePaddle/Paddle/pull/66479), [#71054](https://github.com/PaddlePaddle/Paddle/pull/71054), [#71396](https://github.com/PaddlePaddle/Paddle/pull/71396) + +### 新特性 + +- 新增 Github Action 机制 + [#70571](https://github.com/PaddlePaddle/Paddle/pull/70571), [#70626](https://github.com/PaddlePaddle/Paddle/pull/70626), [#71325](https://github.com/PaddlePaddle/Paddle/pull/71325), [#71344](https://github.com/PaddlePaddle/Paddle/pull/71344), [#71353](https://github.com/PaddlePaddle/Paddle/pull/71353), [#71322](https://github.com/PaddlePaddle/Paddle/pull/71322), [#70415](https://github.com/PaddlePaddle/Paddle/pull/70415), [#70465](https://github.com/PaddlePaddle/Paddle/pull/70465), [#70524](https://github.com/PaddlePaddle/Paddle/pull/70524), [#70550](https://github.com/PaddlePaddle/Paddle/pull/70550), [#70564](https://github.com/PaddlePaddle/Paddle/pull/70564), [#70579](https://github.com/PaddlePaddle/Paddle/pull/70579), [#70580](https://github.com/PaddlePaddle/Paddle/pull/70580), [#70963](https://github.com/PaddlePaddle/Paddle/pull/70963), [#71200](https://github.com/PaddlePaddle/Paddle/pull/71200), [#71261](https://github.com/PaddlePaddle/Paddle/pull/71261), [#71265](https://github.com/PaddlePaddle/Paddle/pull/71265) + +### 废弃 + +- 废弃代码与依赖的清理,包括移除不再依赖的 Python 库以及简化编译配置, 降低维护成本 + [#65635](https://github.com/PaddlePaddle/Paddle/pull/65635), [#67542](https://github.com/PaddlePaddle/Paddle/pull/67542), [#67609](https://github.com/PaddlePaddle/Paddle/pull/67604), [#69572](https://github.com/PaddlePaddle/Paddle/pull/69572), [#68150](https://github.com/PaddlePaddle/Paddle/pull/68150), [#67604](https://github.com/PaddlePaddle/Paddle/pull/67604), [#68561](https://github.com/PaddlePaddle/Paddle/pull/68561), [#68904](https://github.com/PaddlePaddle/Paddle/pull/68904), [#67219](https://github.com/PaddlePaddle/Paddle/pull/67219) + +## 10. 其他 + +- 与用户使用无关的改动,包括废弃代码清理、代码迁移、单测清理、调试或者监控机制升级等。 + +### 开发者相关内容 + +- 删除无用调试代码,代码迁移 + [#65256](https://github.com/PaddlePaddle/Paddle/pull/65256), [#65782](https://github.com/PaddlePaddle/Paddle/pull/65782), [#65836](https://github.com/PaddlePaddle/Paddle/pull/65836), [#65840](https://github.com/PaddlePaddle/Paddle/pull/65840), [#65862](https://github.com/PaddlePaddle/Paddle/pull/65862), [#65863](https://github.com/PaddlePaddle/Paddle/pull/65863), [#65987](https://github.com/PaddlePaddle/Paddle/pull/65987), [#66547](https://github.com/PaddlePaddle/Paddle/pull/66547), [#66556](https://github.com/PaddlePaddle/Paddle/pull/66556), [#66645](https://github.com/PaddlePaddle/Paddle/pull/66645), [#66646](https://github.com/PaddlePaddle/Paddle/pull/66646), [#66648](https://github.com/PaddlePaddle/Paddle/pull/66648), [#66672](https://github.com/PaddlePaddle/Paddle/pull/66672), [#66783](https://github.com/PaddlePaddle/Paddle/pull/66783), [#66083](https://github.com/PaddlePaddle/Paddle/pull/66083), [#65562](https://github.com/PaddlePaddle/Paddle/pull/65562), [#66564](https://github.com/PaddlePaddle/Paddle/pull/66564), [#66370](https://github.com/PaddlePaddle/Paddle/pull/66370), [#66912](https://github.com/PaddlePaddle/Paddle/pull/66912), [#66913](https://github.com/PaddlePaddle/Paddle/pull/66913), [#66914](https://github.com/PaddlePaddle/Paddle/pull/66914), [#66915](https://github.com/PaddlePaddle/Paddle/pull/66915), [#66664](https://github.com/PaddlePaddle/Paddle/pull/66664), [#66671](https://github.com/PaddlePaddle/Paddle/pull/66671), [#66121](https://github.com/PaddlePaddle/Paddle/pull/66121), [#65907](https://github.com/PaddlePaddle/Paddle/pull/65907), [#65949](https://github.com/PaddlePaddle/Paddle/pull/65949), [#65950](https://github.com/PaddlePaddle/Paddle/pull/65950), [#65954](https://github.com/PaddlePaddle/Paddle/pull/65954), [#66545](https://github.com/PaddlePaddle/Paddle/pull/66545), [#66649](https://github.com/PaddlePaddle/Paddle/pull/66649), [#66900](https://github.com/PaddlePaddle/Paddle/pull/66900), [#66901](https://github.com/PaddlePaddle/Paddle/pull/66901), [#66902](https://github.com/PaddlePaddle/Paddle/pull/66902), [#66903](https://github.com/PaddlePaddle/Paddle/pull/66903), [#66904](https://github.com/PaddlePaddle/Paddle/pull/66904), [#66906](https://github.com/PaddlePaddle/Paddle/pull/66906), [#66907](https://github.com/PaddlePaddle/Paddle/pull/66907), [#66908](https://github.com/PaddlePaddle/Paddle/pull/66908), [#66909](https://github.com/PaddlePaddle/Paddle/pull/66909), [#66549](https://github.com/PaddlePaddle/Paddle/pull/66549), [#66555](https://github.com/PaddlePaddle/Paddle/pull/66555), [#66647](https://github.com/PaddlePaddle/Paddle/pull/66647), [#66898](https://github.com/PaddlePaddle/Paddle/pull/66898), [#66886](https://github.com/PaddlePaddle/Paddle/pull/66886), [#66042](https://github.com/PaddlePaddle/Paddle/pull/66042), [#66043](https://github.com/PaddlePaddle/Paddle/pull/66043), [#66045](https://github.com/PaddlePaddle/Paddle/pull/66045), [#66046](https://github.com/PaddlePaddle/Paddle/pull/66046), [#65826](https://github.com/PaddlePaddle/Paddle/pull/65826), [#65825](https://github.com/PaddlePaddle/Paddle/pull/65825), [#65827](https://github.com/PaddlePaddle/Paddle/pull/65827), [#65829](https://github.com/PaddlePaddle/Paddle/pull/65829), [#65830](https://github.com/PaddlePaddle/Paddle/pull/65830), [#65831](https://github.com/PaddlePaddle/Paddle/pull/65831), [#66081](https://github.com/PaddlePaddle/Paddle/pull/66081), [#66082](https://github.com/PaddlePaddle/Paddle/pull/66082), [#66087](https://github.com/PaddlePaddle/Paddle/pull/66087), [#65980](https://github.com/PaddlePaddle/Paddle/pull/65980), [#65981](https://github.com/PaddlePaddle/Paddle/pull/65981), [#65983](https://github.com/PaddlePaddle/Paddle/pull/65983), [#65985](https://github.com/PaddlePaddle/Paddle/pull/65985), [#65979](https://github.com/PaddlePaddle/Paddle/pull/65979), [#65986](https://github.com/PaddlePaddle/Paddle/pull/65986), [#65988](https://github.com/PaddlePaddle/Paddle/pull/65988), [#65989](https://github.com/PaddlePaddle/Paddle/pull/65989), [#66682](https://github.com/PaddlePaddle/Paddle/pull/66682), [#66717](https://github.com/PaddlePaddle/Paddle/pull/66717), [#65802](https://github.com/PaddlePaddle/Paddle/pull/65802), [#66159](https://github.com/PaddlePaddle/Paddle/pull/66159), [#66147](https://github.com/PaddlePaddle/Paddle/pull/66147), [#66149](https://github.com/PaddlePaddle/Paddle/pull/66149), [#66150](https://github.com/PaddlePaddle/Paddle/pull/66150), [#65798](https://github.com/PaddlePaddle/Paddle/pull/65798), [#65731](https://github.com/PaddlePaddle/Paddle/pull/65731), [#66145](https://github.com/PaddlePaddle/Paddle/pull/66145), [#66086](https://github.com/PaddlePaddle/Paddle/pull/66086), [#65781](https://github.com/PaddlePaddle/Paddle/pull/65781), [#65837](https://github.com/PaddlePaddle/Paddle/pull/65837), [#65828](https://github.com/PaddlePaddle/Paddle/pull/65828), [#65864](https://github.com/PaddlePaddle/Paddle/pull/65864), [#65959](https://github.com/PaddlePaddle/Paddle/pull/65959), [#65706](https://github.com/PaddlePaddle/Paddle/pull/65706), [#66918](https://github.com/PaddlePaddle/Paddle/pull/66918), [#66191](https://github.com/PaddlePaddle/Paddle/pull/66191), [#66689](https://github.com/PaddlePaddle/Paddle/pull/66689), [#66808](https://github.com/PaddlePaddle/Paddle/pull/66808), [#65424](https://github.com/PaddlePaddle/Paddle/pull/65424), [#65452](https://github.com/PaddlePaddle/Paddle/pull/65452), [#65463](https://github.com/PaddlePaddle/Paddle/pull/65463), [#65478](https://github.com/PaddlePaddle/Paddle/pull/65478), [#65339](https://github.com/PaddlePaddle/Paddle/pull/65339) +- 规范化代码命名空间 + [#64755](https://github.com/PaddlePaddle/Paddle/pull/64755), [#64765](https://github.com/PaddlePaddle/Paddle/pull/64765), [#64767](https://github.com/PaddlePaddle/Paddle/pull/64767), [#64770](https://github.com/PaddlePaddle/Paddle/pull/64770), [#64775](https://github.com/PaddlePaddle/Paddle/pull/64775), [#64776](https://github.com/PaddlePaddle/Paddle/pull/64776), [#64757](https://github.com/PaddlePaddle/Paddle/pull/64757), [#64780](https://github.com/PaddlePaddle/Paddle/pull/64780), [#64777](https://github.com/PaddlePaddle/Paddle/pull/64777), [#64779](https://github.com/PaddlePaddle/Paddle/pull/64779), [#64758](https://github.com/PaddlePaddle/Paddle/pull/64758), [#64759](https://github.com/PaddlePaddle/Paddle/pull/64759), [#64762](https://github.com/PaddlePaddle/Paddle/pull/64762) +- 修改算子列表 + [#66573](https://github.com/PaddlePaddle/Paddle/pull/66573), [#65598](https://github.com/PaddlePaddle/Paddle/pull/65598), [#65100](https://github.com/PaddlePaddle/Paddle/pull/65100), [#65385](https://github.com/PaddlePaddle/Paddle/pull/65385), [#65192](https://github.com/PaddlePaddle/Paddle/pull/65192), [#65118](https://github.com/PaddlePaddle/Paddle/pull/65118), [#65108](https://github.com/PaddlePaddle/Paddle/pull/65108), [#65153](https://github.com/PaddlePaddle/Paddle/pull/65153), [#65465](https://github.com/PaddlePaddle/Paddle/pull/65465), [#65128](https://github.com/PaddlePaddle/Paddle/pull/65128), [#65420](https://github.com/PaddlePaddle/Paddle/pull/65420), [#65099](https://github.com/PaddlePaddle/Paddle/pull/65099), [#65207](https://github.com/PaddlePaddle/Paddle/pull/65207), [#66066](https://github.com/PaddlePaddle/Paddle/pull/66066), [#65400](https://github.com/PaddlePaddle/Paddle/pull/65400), [#65160](https://github.com/PaddlePaddle/Paddle/pull/65160), [#65195](https://github.com/PaddlePaddle/Paddle/pull/65195), [#65445](https://github.com/PaddlePaddle/Paddle/pull/65445), [#65479](https://github.com/PaddlePaddle/Paddle/pull/65479), [#65193](https://github.com/PaddlePaddle/Paddle/pull/65193), [#65401](https://github.com/PaddlePaddle/Paddle/pull/65401), [#66724](https://github.com/PaddlePaddle/Paddle/pull/66724), [#65164](https://github.com/PaddlePaddle/Paddle/pull/65164), [#65466](https://github.com/PaddlePaddle/Paddle/pull/65466), [#65661](https://github.com/PaddlePaddle/Paddle/pull/65661), [#65897](https://github.com/PaddlePaddle/Paddle/pull/65897), [#66022](https://github.com/PaddlePaddle/Paddle/pull/66022), [#65313](https://github.com/PaddlePaddle/Paddle/pull/65313), [#65616](https://github.com/PaddlePaddle/Paddle/pull/65616), [#65588](https://github.com/PaddlePaddle/Paddle/pull/65588), [#65174](https://github.com/PaddlePaddle/Paddle/pull/65174), [#65402](https://github.com/PaddlePaddle/Paddle/pull/65402), [#65154](https://github.com/PaddlePaddle/Paddle/pull/65154), [#65151](https://github.com/PaddlePaddle/Paddle/pull/65151), [#65098](https://github.com/PaddlePaddle/Paddle/pull/65098), [#64953](https://github.com/PaddlePaddle/Paddle/pull/64953), [#65122](https://github.com/PaddlePaddle/Paddle/pull/65122), [#65590](https://github.com/PaddlePaddle/Paddle/pull/65590), [#65152](https://github.com/PaddlePaddle/Paddle/pull/65152) +- Paddle 框架旧执行器功能退场 + [#65077](https://github.com/PaddlePaddle/Paddle/pull/65077), [#65340](https://github.com/PaddlePaddle/Paddle/pull/65340) +- 报错信息提示优化 + [#66668](https://github.com/PaddlePaddle/Paddle/pull/66668), [#66675](https://github.com/PaddlePaddle/Paddle/pull/66675), [#66605](https://github.com/PaddlePaddle/Paddle/pull/66605), [#66613](https://github.com/PaddlePaddle/Paddle/pull/66613), [#66507](https://github.com/PaddlePaddle/Paddle/pull/66507), [#66700](https://github.com/PaddlePaddle/Paddle/pull/66700), [#66739](https://github.com/PaddlePaddle/Paddle/pull/66739), [#66719](https://github.com/PaddlePaddle/Paddle/pull/66719), [#66733](https://github.com/PaddlePaddle/Paddle/pull/66733), [#66552](https://github.com/PaddlePaddle/Paddle/pull/66552), [#66548](https://github.com/PaddlePaddle/Paddle/pull/66548), [#66623](https://github.com/PaddlePaddle/Paddle/pull/66623), [#66702](https://github.com/PaddlePaddle/Paddle/pull/66702), [#66705](https://github.com/PaddlePaddle/Paddle/pull/66705), [#66718](https://github.com/PaddlePaddle/Paddle/pull/66718), [#66727](https://github.com/PaddlePaddle/Paddle/pull/66727), [#66860](https://github.com/PaddlePaddle/Paddle/pull/66860), [#66869](https://github.com/PaddlePaddle/Paddle/pull/66869), [#66933](https://github.com/PaddlePaddle/Paddle/pull/66933), [#66939](https://github.com/PaddlePaddle/Paddle/pull/66939), [#66553](https://github.com/PaddlePaddle/Paddle/pull/66553), [#66774](https://github.com/PaddlePaddle/Paddle/pull/66774), [#66794](https://github.com/PaddlePaddle/Paddle/pull/66794), [#66551](https://github.com/PaddlePaddle/Paddle/pull/66551), [#66540](https://github.com/PaddlePaddle/Paddle/pull/66540), [#66617](https://github.com/PaddlePaddle/Paddle/pull/66617), [#66841](https://github.com/PaddlePaddle/Paddle/pull/66841), [#66788](https://github.com/PaddlePaddle/Paddle/pull/66788), [#66954](https://github.com/PaddlePaddle/Paddle/pull/66954), [#66698](https://github.com/PaddlePaddle/Paddle/pull/66698), [#66782](https://github.com/PaddlePaddle/Paddle/pull/66782), [#66844](https://github.com/PaddlePaddle/Paddle/pull/66844), [#66443](https://github.com/PaddlePaddle/Paddle/pull/66443), [#66455](https://github.com/PaddlePaddle/Paddle/pull/66455), [#66517](https://github.com/PaddlePaddle/Paddle/pull/66517), [#66804](https://github.com/PaddlePaddle/Paddle/pull/66804), [#66802](https://github.com/PaddlePaddle/Paddle/pull/66802), [#66536](https://github.com/PaddlePaddle/Paddle/pull/66536), [#66707](https://github.com/PaddlePaddle/Paddle/pull/66707), [#66525](https://github.com/PaddlePaddle/Paddle/pull/66525), [#66753](https://github.com/PaddlePaddle/Paddle/pull/66753), [#66550](https://github.com/PaddlePaddle/Paddle/pull/66550), [#66857](https://github.com/PaddlePaddle/Paddle/pull/66857), [#66471](https://github.com/PaddlePaddle/Paddle/pull/66471), [#66628](https://github.com/PaddlePaddle/Paddle/pull/66628), [#66469](https://github.com/PaddlePaddle/Paddle/pull/66469), [#66775](https://github.com/PaddlePaddle/Paddle/pull/66775), [#66506](https://github.com/PaddlePaddle/Paddle/pull/66506), [#66780](https://github.com/PaddlePaddle/Paddle/pull/66780), [#66953](https://github.com/PaddlePaddle/Paddle/pull/66953), [#66695](https://github.com/PaddlePaddle/Paddle/pull/66695), [#66603](https://github.com/PaddlePaddle/Paddle/pull/66603), [#66491](https://github.com/PaddlePaddle/Paddle/pull/66491), [#66715](https://github.com/PaddlePaddle/Paddle/pull/66715), [#66632](https://github.com/PaddlePaddle/Paddle/pull/66632), [#66594](https://github.com/PaddlePaddle/Paddle/pull/66594), [#66615](https://github.com/PaddlePaddle/Paddle/pull/66615), [#66578](https://github.com/PaddlePaddle/Paddle/pull/66578), [#66534](https://github.com/PaddlePaddle/Paddle/pull/66534), [#66569](https://github.com/PaddlePaddle/Paddle/pull/66569), [#66529](https://github.com/PaddlePaddle/Paddle/pull/66529), [#66530](https://github.com/PaddlePaddle/Paddle/pull/66530), [#66522](https://github.com/PaddlePaddle/Paddle/pull/66522), [#66789](https://github.com/PaddlePaddle/Paddle/pull/66789), [#66600](https://github.com/PaddlePaddle/Paddle/pull/66600), [#66511](https://github.com/PaddlePaddle/Paddle/pull/66511), [#66512](https://github.com/PaddlePaddle/Paddle/pull/66512), [#66527](https://github.com/PaddlePaddle/Paddle/pull/66527), [#66518](https://github.com/PaddlePaddle/Paddle/pull/66518), [#66958](https://github.com/PaddlePaddle/Paddle/pull/66958), [#66532](https://github.com/PaddlePaddle/Paddle/pull/66532), [#65258](https://github.com/PaddlePaddle/Paddle/pull/65258), [#66487](https://github.com/PaddlePaddle/Paddle/pull/66487), [#66876](https://github.com/PaddlePaddle/Paddle/pull/66876), [#66832](https://github.com/PaddlePaddle/Paddle/pull/66832), [#66872](https://github.com/PaddlePaddle/Paddle/pull/66872), [#66830](https://github.com/PaddlePaddle/Paddle/pull/66830), [#66708](https://github.com/PaddlePaddle/Paddle/pull/66708), [#66502](https://github.com/PaddlePaddle/Paddle/pull/66502), [#66521](https://github.com/PaddlePaddle/Paddle/pull/66521), [#66592](https://github.com/PaddlePaddle/Paddle/pull/66592) + +### 废弃 + +- 废弃代码清理、无用单测清理 + [#65894](https://github.com/PaddlePaddle/Paddle/pull/65894), [#66165](https://github.com/PaddlePaddle/Paddle/pull/66165), [#66293](https://github.com/PaddlePaddle/Paddle/pull/66293), [#66102](https://github.com/PaddlePaddle/Paddle/pull/66102), [#66442](https://github.com/PaddlePaddle/Paddle/pull/66442), [#66922](https://github.com/PaddlePaddle/Paddle/pull/66922), [#66531](https://github.com/PaddlePaddle/Paddle/pull/66531), [#65518](https://github.com/PaddlePaddle/Paddle/pull/65518), [#66800](https://github.com/PaddlePaddle/Paddle/pull/66800), [#66372](https://github.com/PaddlePaddle/Paddle/pull/66372), [#65902](https://github.com/PaddlePaddle/Paddle/pull/65902), [#65462](https://github.com/PaddlePaddle/Paddle/pull/65462), [#65327](https://github.com/PaddlePaddle/Paddle/pull/65327), [#65189](https://github.com/PaddlePaddle/Paddle/pull/65189), [#65181](https://github.com/PaddlePaddle/Paddle/pull/65181), [#66535](https://github.com/PaddlePaddle/Paddle/pull/66535), [#65383](https://github.com/PaddlePaddle/Paddle/pull/65383), [#65173](https://github.com/PaddlePaddle/Paddle/pull/65173), [#66429](https://github.com/PaddlePaddle/Paddle/pull/66429), [#66386](https://github.com/PaddlePaddle/Paddle/pull/66386), [#66447](https://github.com/PaddlePaddle/Paddle/pull/66447), [#66367](https://github.com/PaddlePaddle/Paddle/pull/66367), [#66160](https://github.com/PaddlePaddle/Paddle/pull/66160), [#65408](https://github.com/PaddlePaddle/Paddle/pull/65408), [#65433](https://github.com/PaddlePaddle/Paddle/pull/65433), [#65481](https://github.com/PaddlePaddle/Paddle/pull/65481), [#65444](https://github.com/PaddlePaddle/Paddle/pull/65444), [#65389](https://github.com/PaddlePaddle/Paddle/pull/65389), [#65663](https://github.com/PaddlePaddle/Paddle/pull/65663), [#65649](https://github.com/PaddlePaddle/Paddle/pull/65649), [#65629](https://github.com/PaddlePaddle/Paddle/pull/65629), [#66142](https://github.com/PaddlePaddle/Paddle/pull/66142), [#65796](https://github.com/PaddlePaddle/Paddle/pull/65796), [#66163](https://github.com/PaddlePaddle/Paddle/pull/66163), [#66291](https://github.com/PaddlePaddle/Paddle/pull/66291), [#65480](https://github.com/PaddlePaddle/Paddle/pull/65480), [#65495](https://github.com/PaddlePaddle/Paddle/pull/65495), [#65498](https://github.com/PaddlePaddle/Paddle/pull/65498), [#65503](https://github.com/PaddlePaddle/Paddle/pull/65503), [#65502](https://github.com/PaddlePaddle/Paddle/pull/65502), [#65501](https://github.com/PaddlePaddle/Paddle/pull/65501), [#65512](https://github.com/PaddlePaddle/Paddle/pull/65512), [#65528](https://github.com/PaddlePaddle/Paddle/pull/65528), [#65472](https://github.com/PaddlePaddle/Paddle/pull/65472), [#65390](https://github.com/PaddlePaddle/Paddle/pull/65390), [#65344](https://github.com/PaddlePaddle/Paddle/pull/65344), [#65384](https://github.com/PaddlePaddle/Paddle/pull/65384), [#65388](https://github.com/PaddlePaddle/Paddle/pull/65388), [#65198](https://github.com/PaddlePaddle/Paddle/pull/65198), [#65248](https://github.com/PaddlePaddle/Paddle/pull/65248), [#65443](https://github.com/PaddlePaddle/Paddle/pull/65443), [#65430](https://github.com/PaddlePaddle/Paddle/pull/65430) -## 12.其他升级内容 -与用户使用无关的改动,包括废弃代码清理、无用单测清理、调试或者监控机制升级等。[#63377](https://github.com/PaddlePaddle/Paddle/pull/63377),[#64106](https://github.com/PaddlePaddle/Paddle/pull/64106),[#64220](https://github.com/PaddlePaddle/Paddle/pull/64220),[#64293](https://github.com/PaddlePaddle/Paddle/pull/64293),[#64464](https://github.com/PaddlePaddle/Paddle/pull/64464),[#64944](https://github.com/PaddlePaddle/Paddle/pull/64944),[#63638](https://github.com/PaddlePaddle/Paddle/pull/63638),[#63732](https://github.com/PaddlePaddle/Paddle/pull/63732),[#63735](https://github.com/PaddlePaddle/Paddle/pull/63735),[#63826](https://github.com/PaddlePaddle/Paddle/pull/63826),[#63982](https://github.com/PaddlePaddle/Paddle/pull/63982),[#63737](https://github.com/PaddlePaddle/Paddle/pull/63737),[#64471](https://github.com/PaddlePaddle/Paddle/pull/64471),[#64574](https://github.com/PaddlePaddle/Paddle/pull/64574),[#64494](https://github.com/PaddlePaddle/Paddle/pull/64494),[#62775](https://github.com/PaddlePaddle/Paddle/pull/62775),[#63601](https://github.com/PaddlePaddle/Paddle/pull/63601),[#62564](https://github.com/PaddlePaddle/Paddle/pull/62564),[#63772](https://github.com/PaddlePaddle/Paddle/pull/63772),[#64719](https://github.com/PaddlePaddle/Paddle/pull/64719),[#61640](https://github.com/PaddlePaddle/Paddle/pull/61640),[#63459](https://github.com/PaddlePaddle/Paddle/pull/63459),[#64062](https://github.com/PaddlePaddle/Paddle/pull/64062),[#63480](https://github.com/PaddlePaddle/Paddle/pull/63480),[#63833](https://github.com/PaddlePaddle/Paddle/pull/63833)[#63673](https://github.com/PaddlePaddle/Paddle/pull/63673),[#63672](https://github.com/PaddlePaddle/Paddle/pull/63672),[#64131](https://github.com/PaddlePaddle/Paddle/pull/64131),[#64156](https://github.com/PaddlePaddle/Paddle/pull/64156),[#64155](https://github.com/PaddlePaddle/Paddle/pull/64155),[#64159](https://github.com/PaddlePaddle/Paddle/pull/64159),[#63902](https://github.com/PaddlePaddle/Paddle/pull/63902),[#64230](https://github.com/PaddlePaddle/Paddle/pull/64230),[#64229](https://github.com/PaddlePaddle/Paddle/pull/64229),[#64236](https://github.com/PaddlePaddle/Paddle/pull/64236),[#64260](https://github.com/PaddlePaddle/Paddle/pull/64260),[#64175](https://github.com/PaddlePaddle/Paddle/pull/64175),[#64250](https://github.com/PaddlePaddle/Paddle/pull/64250),[#64269](https://github.com/PaddlePaddle/Paddle/pull/64269),[#64238](https://github.com/PaddlePaddle/Paddle/pull/64238),[#64349](https://github.com/PaddlePaddle/Paddle/pull/64349),[#64394](https://github.com/PaddlePaddle/Paddle/pull/64394),[#64402](https://github.com/PaddlePaddle/Paddle/pull/64402),[#64401](https://github.com/PaddlePaddle/Paddle/pull/64401),[#64388](https://github.com/PaddlePaddle/Paddle/pull/64388),[#64329](https://github.com/PaddlePaddle/Paddle/pull/64329),[#64502](https://github.com/PaddlePaddle/Paddle/pull/64502),[#64501](https://github.com/PaddlePaddle/Paddle/pull/64501),[#64515](https://github.com/PaddlePaddle/Paddle/pull/64515),[#64503](https://github.com/PaddlePaddle/Paddle/pull/64503),[#64514](https://github.com/PaddlePaddle/Paddle/pull/64514),[#64601](https://github.com/PaddlePaddle/Paddle/pull/64601),[#64564](https://github.com/PaddlePaddle/Paddle/pull/64564),[#64012](https://github.com/PaddlePaddle/Paddle/pull/64012),[#64697](https://github.com/PaddlePaddle/Paddle/pull/64697),[#64682](https://github.com/PaddlePaddle/Paddle/pull/64682),[#64051](https://github.com/PaddlePaddle/Paddle/pull/64051),[#63267](https://github.com/PaddlePaddle/Paddle/pull/63267),[#63426](https://github.com/PaddlePaddle/Paddle/pull/63426),[#63626](https://github.com/PaddlePaddle/Paddle/pull/63626),[#63257](https://github.com/PaddlePaddle/Paddle/pull/63257),[#63266](https://github.com/PaddlePaddle/Paddle/pull/63266),[#63468](https://github.com/PaddlePaddle/Paddle/pull/63468),[#63262](https://github.com/PaddlePaddle/Paddle/pull/63262),[#63248](https://github.com/PaddlePaddle/Paddle/pull/63248),[#63241](https://github.com/PaddlePaddle/Paddle/pull/63241),[#63252](https://github.com/PaddlePaddle/Paddle/pull/63252),[#63258](https://github.com/PaddlePaddle/Paddle/pull/63258),[#63235](https://github.com/PaddlePaddle/Paddle/pull/63235),[#63399](https://github.com/PaddlePaddle/Paddle/pull/63399),[#63488](https://github.com/PaddlePaddle/Paddle/pull/63488),[#63487](https://github.com/PaddlePaddle/Paddle/pull/63487),[#63466](https://github.com/PaddlePaddle/Paddle/pull/63466),[#63464](https://github.com/PaddlePaddle/Paddle/pull/63464),[#63483](https://github.com/PaddlePaddle/Paddle/pull/63483),[#63486](https://github.com/PaddlePaddle/Paddle/pull/63486),[#63475](https://github.com/PaddlePaddle/Paddle/pull/63475),[#63489](https://github.com/PaddlePaddle/Paddle/pull/63489),[#63470](https://github.com/PaddlePaddle/Paddle/pull/63470),[#63457](https://github.com/PaddlePaddle/Paddle/pull/63457),[#63493](https://github.com/PaddlePaddle/Paddle/pull/63493),[#63561](https://github.com/PaddlePaddle/Paddle/pull/63561),[#63584](https://github.com/PaddlePaddle/Paddle/pull/63584),[#63587](https://github.com/PaddlePaddle/Paddle/pull/63587),[#63586](https://github.com/PaddlePaddle/Paddle/pull/63586),[#63569](https://github.com/PaddlePaddle/Paddle/pull/63569),[#63559](https://github.com/PaddlePaddle/Paddle/pull/63559),[#63558](https://github.com/PaddlePaddle/Paddle/pull/63558),[#63555](https://github.com/PaddlePaddle/Paddle/pull/63555),[#63543](https://github.com/PaddlePaddle/Paddle/pull/63543),[#63589](https://github.com/PaddlePaddle/Paddle/pull/63589),[#63583](https://github.com/PaddlePaddle/Paddle/pull/63583),[#63565](https://github.com/PaddlePaddle/Paddle/pull/63565),[#63564](https://github.com/PaddlePaddle/Paddle/pull/63564),[#63265](https://github.com/PaddlePaddle/Paddle/pull/63265),[#63562](https://github.com/PaddlePaddle/Paddle/pull/63562),[#63591](https://github.com/PaddlePaddle/Paddle/pull/63591),[#63460](https://github.com/PaddlePaddle/Paddle/pull/63460),[#63238](https://github.com/PaddlePaddle/Paddle/pull/63238),[#63631](https://github.com/PaddlePaddle/Paddle/pull/63631),[#63707](https://github.com/PaddlePaddle/Paddle/pull/63707),[#63714](https://github.com/PaddlePaddle/Paddle/pull/63714),[#63854](https://github.com/PaddlePaddle/Paddle/pull/63854),[#63929](https://github.com/PaddlePaddle/Paddle/pull/63929),[#63532](https://github.com/PaddlePaddle/Paddle/pull/63532),[#59628](https://github.com/PaddlePaddle/Paddle/pull/59628),[#62209](https://github.com/PaddlePaddle/Paddle/pull/62209),[#63742](https://github.com/PaddlePaddle/Paddle/pull/63742),[#60518](https://github.com/PaddlePaddle/Paddle/pull/60518),[#62078](https://github.com/PaddlePaddle/Paddle/pull/62078),[#62684](https://github.com/PaddlePaddle/Paddle/pull/62684),[#62723](https://github.com/PaddlePaddle/Paddle/pull/62723),[#64141](https://github.com/PaddlePaddle/Paddle/pull/64141),[#60404](https://github.com/PaddlePaddle/Paddle/pull/60404),[#64212](https://github.com/PaddlePaddle/Paddle/pull/64212),[#60652](https://github.com/PaddlePaddle/Paddle/pull/60652),[#64545](https://github.com/PaddlePaddle/Paddle/pull/64545),[#64477](https://github.com/PaddlePaddle/Paddle/pull/64477),[#64556](https://github.com/PaddlePaddle/Paddle/pull/64556),[#63160](https://github.com/PaddlePaddle/Paddle/pull/63160),[#63796](https://github.com/PaddlePaddle/Paddle/pull/63796),[#64693](https://github.com/PaddlePaddle/Paddle/pull/64693),[#64484](https://github.com/PaddlePaddle/Paddle/pull/64484),[#64677](https://github.com/PaddlePaddle/Paddle/pull/64677),[#64461](https://github.com/PaddlePaddle/Paddle/pull/64461),[#63189](https://github.com/PaddlePaddle/Paddle/pull/63189),[#63855](https://github.com/PaddlePaddle/Paddle/pull/63855),[#63896](https://github.com/PaddlePaddle/Paddle/pull/63896),[#63193](https://github.com/PaddlePaddle/Paddle/pull/63193),[#63200](https://github.com/PaddlePaddle/Paddle/pull/63200),[#63406](https://github.com/PaddlePaddle/Paddle/pull/63406),[#61283](https://github.com/PaddlePaddle/Paddle/pull/61283),[#63607](https://github.com/PaddlePaddle/Paddle/pull/63607),[#64486](https://github.com/PaddlePaddle/Paddle/pull/64486),[#64004](https://github.com/PaddlePaddle/Paddle/pull/64004),[#63132](https://github.com/PaddlePaddle/Paddle/pull/63132),[#63553](https://github.com/PaddlePaddle/Paddle/pull/63553),[#63572](https://github.com/PaddlePaddle/Paddle/pull/63572),[#63794](https://github.com/PaddlePaddle/Paddle/pull/63794),[#63919](https://github.com/PaddlePaddle/Paddle/pull/63919),[#63980](https://github.com/PaddlePaddle/Paddle/pull/63980),[#62917](https://github.com/PaddlePaddle/Paddle/pull/62917),[#64451](https://github.com/PaddlePaddle/Paddle/pull/64451),[#63541](https://github.com/PaddlePaddle/Paddle/pull/63541),[#63703](https://github.com/PaddlePaddle/Paddle/pull/63703),[#64536](https://github.com/PaddlePaddle/Paddle/pull/64536),[#63264](https://github.com/PaddlePaddle/Paddle/pull/63264),[#63335](https://github.com/PaddlePaddle/Paddle/pull/63335),[#63841](https://github.com/PaddlePaddle/Paddle/pull/63841),[#64628](https://github.com/PaddlePaddle/Paddle/pull/64628),[#63419](https://github.com/PaddlePaddle/Paddle/pull/63419),[#62210](https://github.com/PaddlePaddle/Paddle/pull/62210),[#63557](https://github.com/PaddlePaddle/Paddle/pull/63557),[#63064](https://github.com/PaddlePaddle/Paddle/pull/63064),[#61442](https://github.com/PaddlePaddle/Paddle/pull/61442),[#63537](https://github.com/PaddlePaddle/Paddle/pull/63537),[#63839](https://github.com/PaddlePaddle/Paddle/pull/63839),[#60927](https://github.com/PaddlePaddle/Paddle/pull/60927),[#60566](https://github.com/PaddlePaddle/Paddle/pull/60566),[#60842](https://github.com/PaddlePaddle/Paddle/pull/60842),[#64612](https://github.com/PaddlePaddle/Paddle/pull/64612),[#60047](https://github.com/PaddlePaddle/Paddle/pull/60047),[#63898](https://github.com/PaddlePaddle/Paddle/pull/63898),[#60415](https://github.com/PaddlePaddle/Paddle/pull/60415),[#60474](https://github.com/PaddlePaddle/Paddle/pull/60474),[#60439](https://github.com/PaddlePaddle/Paddle/pull/60439),[#60565](https://github.com/PaddlePaddle/Paddle/pull/60565),[#64414](https://github.com/PaddlePaddle/Paddle/pull/64414),[#62526](https://github.com/PaddlePaddle/Paddle/pull/62526),[#54183](https://github.com/PaddlePaddle/Paddle/pull/54183),[#64096](https://github.com/PaddlePaddle/Paddle/pull/64096),[#61325](https://github.com/PaddlePaddle/Paddle/pull/61325),[#60629](https://github.com/PaddlePaddle/Paddle/pull/60629),[#61051](https://github.com/PaddlePaddle/Paddle/pull/61051),[#62103](https://github.com/PaddlePaddle/Paddle/pull/62103),[#63594](https://github.com/PaddlePaddle/Paddle/pull/63594),[#60968](https://github.com/PaddlePaddle/Paddle/pull/60968),[#64613](https://github.com/PaddlePaddle/Paddle/pull/64613),[#64073](https://github.com/PaddlePaddle/Paddle/pull/64073),[#63816](https://github.com/PaddlePaddle/Paddle/pull/63816),[#64416](https://github.com/PaddlePaddle/Paddle/pull/64416),[#62499](https://github.com/PaddlePaddle/Paddle/pull/62499),[#64531](https://github.com/PaddlePaddle/Paddle/pull/64531),[#63827](https://github.com/PaddlePaddle/Paddle/pull/63827),[#59885](https://github.com/PaddlePaddle/Paddle/pull/59885),[#59949](https://github.com/PaddlePaddle/Paddle/pull/59949),[#63428](https://github.com/PaddlePaddle/Paddle/pull/63428),[#63218](https://github.com/PaddlePaddle/Paddle/pull/63218),[#63538](https://github.com/PaddlePaddle/Paddle/pull/63538),[#64497](https://github.com/PaddlePaddle/Paddle/pull/64497),[#63082](https://github.com/PaddlePaddle/Paddle/pull/63082),[#64395](https://github.com/PaddlePaddle/Paddle/pull/64395),[#60183](https://github.com/PaddlePaddle/Paddle/pull/60183),[#63691](https://github.com/PaddlePaddle/Paddle/pull/63691),[#64428](https://github.com/PaddlePaddle/Paddle/pull/64428),[#64648](https://github.com/PaddlePaddle/Paddle/pull/64648),[#64650](https://github.com/PaddlePaddle/Paddle/pull/64650),[#59926](https://github.com/PaddlePaddle/Paddle/pull/59926),[#59750](https://github.com/PaddlePaddle/Paddle/pull/59750),[#60080](https://github.com/PaddlePaddle/Paddle/pull/60080),[#60208](https://github.com/PaddlePaddle/Paddle/pull/60208),[#64124](https://github.com/PaddlePaddle/Paddle/pull/64124),[#64187](https://github.com/PaddlePaddle/Paddle/pull/64187),[#64166](https://github.com/PaddlePaddle/Paddle/pull/64166),[#64284](https://github.com/PaddlePaddle/Paddle/pull/64284),[#64253](https://github.com/PaddlePaddle/Paddle/pull/64253),[#64555](https://github.com/PaddlePaddle/Paddle/pull/64555),[#59878](https://github.com/PaddlePaddle/Paddle/pull/59878),[#64081](https://github.com/PaddlePaddle/Paddle/pull/64081) +## 11. 贡献者名单 -## 13.贡献者名单 -6clc, Android zhang, Asthestarsfalll, Ataf Fazledin Ahamed, Aurelius84, AyaseNana, Baizhou Zhang, bapijun, BiynXu, Botao Zhou, Bo Zhang, bukejiyu, caozhou, chalsliu, Chang Xu, Charles-hit, chen2016013, Chen Zhiyang, C.J.0_0, cmcamdy, co63oc, coco, cyber-pioneer, cyberslack_lee, danleifeng, diadestiny, Difer, Dmovic, Eddie-Wang, Eddie Zhang, engineer1109, enzodechine, fanhaoxuee, feifei-111, flying-forever, Frank Lin, freeliuzc, fsczz, Galaxy1458, GGBond8488, Ghost Screaming, gongweibao, gouzil, Guoxia Wang, handiz, HankYang, Haohongxiang, haosicheng, hess, hjyp, hong, Hongqing-work, Hongwen Xin, HongyuJia, houj04, huangjiyi, Huihuang Zheng, hxzd5568, hyDONG, HydrogenSulfate, idontkonwher, iLeGend, Jeng Bai-Cheng, Jianbang Yang, Jia Wenxuan, JYChen, jzhang533, JZ-LIANG, Kai Song, kangguangli, kevin, Kunbo Ding, lanxianghit, Leo Chen, Leo Guo, lijialin03, lijin23, linkk08, Liujie0926, Liuyinfeng, liu zhengxi, liuzhenhai93, liym27, LiYuRio, lizexu123, LoneRanger, Longzhi Wang, Lucas, Lu Qi, lzy, lzydev, MayYouBeProsperous, megemini, Meiyim, ming1753, Mingdong Wang, ndren, NeroLoh, NetPunk, Nguyen Cong Vinh, Nyakku Shigure, Omri Alon, onepick, ooo oo, pangengzheng, PommesPeter, Qi Li, QingshuChen, Qi Shao, RedContritio, Reese Wang, RichardWooSJTU, risemeup1, Roc, ronnywang, Ruibiao Chen, Ruibin Cheung, RuohengMa, Ryan, Shaopeng Ling, ShenLiang, Shijie, Shuhao Liang, Siming Dai, skywalker2012, smallpoxscattered, sneaxiy, Sonder, Sunny-bot1, Tao Luo, tc20042008, Terry, Tian, tianhaodongbd, tianshuo78520a, Tianyu Feng, Tian Zheng, Tongkai, Travis-Lee, unseenme, Vigi Zhang, walkalone20, Wang Bojun, wanghuancoder, wangna11BD, Wang Xin, Wangzheee, WangZhen, wanly young, wawltor, wendaxiao, Wen Sun, wentao yu, Wenyu, wenzhe.wang, Winters Montagne, winter-wang, WoWYoYLoL, Wu Chencan, Wu Fei, wuhuachaocoding, Xianduo Li, XiangGao, XiaociZhang, xiaoguoguo626807, xiaoxiaohehe001, Xiao Xiyuan, Xiaoxu Chen, xiaoyao0115, xiaoye, xingmingyyj, Xinyi_LI, Xinyu Yang, xiongkun, xuxinyi389, xysheng-baidu, yangguohao, YibLiu, Yichen Zhang, yinfan98, yinwei, Yiqun Liu, YKTian, Yuang Liu, Yuanle Liu, YuanRisheng, yuguo, yujun, yulangz, YUNSHEN XIE, zbt78, ZelinMa557, Zero Rains, Zeyu Chen, zhangbo9674, Zhang,Lirong, Zhang Ting, zhangyikun02, zhangyuqin1998, Zhan Rongrui, zhaohaixu, zhaoyingli, Zhenghai Zhang, zhengzhonghui, zhink, ZhouMengLei1999, zhouzj, zhupengyang, zhurou603, zhuyipin, zhwesky2010, Zichao, zxcd, zyfncg, zyt1024, 东百月, 傅剑寒, 周周周, 周波涛, 张春乔, 萧 +0x3878f, 0x45f, 2742195759, 86kkd, A-nnonymous, ADream-ki, Aganlengzi, Albresky, AndPuQing, AndSonder, Aoraki-Dream, ApricityXX, Asthestarsfalll, Aurelius84, BHmingyang, BeingGod, Betelgeu, BiynXu, CJ77Qi, Caogration, DDDivano, Dale1314, Deleter-D, DesmonDay, Difers, Dmovic, DongBaiYue, DrRyanHuang, DrownFish19, Eddie-Wang1120, EgoistSA, FeixLiu, ForFishes, Fripping, From00, Function-Samuel, GoldenStain, Guanhuachen2003, GuoxiaWang, Hanyonggong, HarperCy, Hongqing-work, HydrogenSulfate, JZ-LIANG, Jeff114514, JiaWenxuan, LLee233, LanCole, Lans1ot, Layssy, Leoforever123, LiYuRio, LielinJiang, LittleHeroZZZX, Liujie0926, Liyulingyue, Luohongzhige, Marcusryz, MarisaSparkL, Micalling, MikhayEeer, MrXnneHang, MufanColin, NKNaN, Neo-WY, NeroLoh, PolaKuma, Qin-sx, QingshuChen, RachelXu7, RichardWooSJTU, RuohengMa, SCUcookie, Sekiro-x, SigureMo, Sunny-bot1, SylarTiaNII, Sylence8, TBD1, TR666, TimeYWL, Tom-Zheng, Turingg, Victor-Bayim, Vvsmile, WAYKEN-TSE, Wanglongzhi2001, Wangzheee, Waynezee, Wennie396, Whsjrczr, Wizard-ZP, Wong4j, XavierZXY, XiaociZhang, XieYunshen, Xing-lil, Xreki, YKTian-x2b, YZW-explorer, YanhuiDua, YuanRisheng, ZHOU05030, ZhangHandi, ZhangX-21, ZibinGuo, a2064968462, anderson101866, aooxin, aquagull, baoqiwen, bapijun, blacksheep-Aristotle, bukejiyu, carryyu, ccsuzzh, chang-wenbin, changeyoung98, chen2016013, ckl117, cmcamdy, co63oc, continue-coding, cqulilujia, crazyxiaoxi, cszdrg, cubehan3, cyber-pioneer, danleifeng, decade-afk, deepllz, dynamicheart, eee4017, eggman-1024, enkilee, epiphanyer, ethan-sem, fangfangssj, feixi21, fightfat, fufu0615, fxfxfxfxfxfxfxfx, fxy1699, gitliuyf, gongel, gongshaotian, gongweibao, gouzil, gsq7474741, guixxiic, gzy19990617, hanyang2508, haoyu2022, heavyrain-lzy, houj04, huangjiyi, huangkr03, hxzd5568, icpcccpc, inaomIIsfarell, iosmers, jeff41404, jerrywgz, jiachengdai, jiahy0825, jinmingyi1998, jinyouzhi, joseflv, jychen21, jzhang533, kangguangli, kanze1, kineast, kircle888, l1cacheDell, leo0519, lifulll, linkk08, little1d, liufengwei0103, liuruyan, lixcli, liym27, liyongchao911, lizexu123, lizhenyun01, lj970926, lshpku, lszxb, ltd0924, luotao1, lwkhahaha, lxd-cumt, mayang002, megemini, mikemikimike, ming1753, monster1015, mori0umi, ndyysheep, nizne9, nobodynobody, ooooo-create, penPenf28, phlrain, pkuzyc, qili93, rich04lin, risemeup1, ronny1996, rsmallblue, runzhech, skywalker2012, smile2game, sneaxiy, successfulbarrier, sunzhongkai588, swgu98, tc20042008, tianhaodongbd, tianshuo78520a, tizhou86, tlxd, uanu2002, umiswing, vivienfanghuagood, waliwali777, walkalone20, wanghuancoder, wangna11BD, will-jl944, winffke, winter-wang, wwwuyan, xiaoguoguo626807, xiaoluomi, xiaoyao0115, xingmingyyj, xkkkkkk23, xu8117, xuxinyi389, xz-alex, yangrongxinuser, yeteye, yinfan98, yongqiangma, yuan20041218, yuanlehome, yuguo-Jack, yumin066, zbt78, zeroRains, zhangbo9674, zhanghonggeng, zhanglirong1999, zhangting2020, zhangyk0314, zhangyuqin1998, zhiminzhang0830, zhink, zhiqiu, zhouquan32, zhoutianzi666, zhwesky2010, zoooo0820, zrr1999, zty-king, zxcd, zyfncg diff --git a/docs/release_note_en.md b/docs/release_note_en.md index 0052edd54df..a698d6d81a3 100644 --- a/docs/release_note_en.md +++ b/docs/release_note_en.md @@ -1,537 +1,543 @@ -# 3.0 Beta Release Note +# 3.0 Release Note -# Overview of PaddlePaddle 3.0 Beta +Declaration: This document is translated by [Baidu Translate](https://fanyi.baidu.com/) -The core features of this version mainly include new technologies such as dynamic-static unity auto parallel and automatic optimization of neural network compiler, to aim to address the new challenges in the current deep learning field.PaddlePaddle Framework 3.0 Beta extends the design concepts of 2.x such as dynamic-static unity and integrated training and inference. The development interface is fully compatible with 2.x version. This means that codes developed in version 2.x can run directly on version 3.x without modification in most cases. Several key features are detailed as follows: +As China's first independently developed industrial-grade deep learning platform, PaddlePaddle has always adhered to the open-source path, supporting the intelligent upgrade of industries. The PaddlePaddle framework version 3.0 not only continues the characteristics of the PaddlePaddle framework 2.0 series, which unifies static and dynamic operations and integrates training and inference, but also achieves breakthroughs in automatic parallelism, neural network compilers, and high-order automatic differentiation, providing strong support for technological innovation and industrial applications in the era of large models, and creating a one-stop, high-performance deep learning development experience for developers. Whether it is cutting-edge algorithm research or the implementation of industrial-grade large models, PaddlePaddle framework version 3.0 will become the preferred tool for developers. Key features are described as follows: -- Dynamic-static graph unified auto parallel: To make the parallel training programming of large models easier, PaddlePaddle has also optimized the semi-auto parallel programming paradigm with dynamic-static graph unified. Developers do not need to delve into the complex concepts and APIs need in manual parallel programming; developers only need to perform a small amount of tensor sharding annotation to complete the construction of hybrid parallelism for large models. The framework is able to automatically derive distributed sharding states and add communication operators, and also supports one-key dynamic-to-static distributed training, thus dramatically simplifying the development of hybrid parallel training codes. In terms of dynamic-static unity, PaddlePaddle has comprehensively upgraded its dynamic-to-static training capability by adopting bytecode-based dynamic-static conversion technology, to support adaptive graph construction functions. It has been verified on more than 700 PaddlePaddle industrial-grade models, achieving a 100% success rate of one-key dynamic-to-static training. -- Automatic optimization of neural network compiler: PaddlePaddle Compiler Infrastructure for Neural Networks (CINN) adopts the design of integration with the framework, supporting the efficient training and dynamic shape inference of generative models, scientific computing models and other models. This provides a good balance between computational flexibility and high performance. The inference performance of Llama2 and Stable Diffusion models has been improved by 30% through automatic fusion of operators and code generation technology. -- High-order automatic differentiation: In order to better support scientific computing scenarios, PaddlePaddle Framework designs and implements high-order automatic differentiation technology based on combinatorial operator mechanism, combined with automatic optimization technology of neural network compiler. We have tested more than 40 differential equations in scientific computing scenarios, and its solution speed is 70% ahead of similar products in the industry. -- Highly scalable intermediate representation: In order to improve the scalability of the PaddlePaddle framework, we have developed a highly scalable Paddle Intermediate Representation (PIR).This representation systematically abstracts the underlying core concepts and provides flexible and efficient components. PIR serves as the infrastructure to support a number of technologies such as dynamic-to-static, automatic differentiation, auto parallel, combinatorial operators, and graph optimization; it is widely used in scenarios such as distributed training, model compression, and inference deployment. With the Declarative Rewrite Rule (DRR) mechanism provided by PIR, the development cost of Pass can be reduced by 60%.We have tested over 900 model configurations and the results show that the overall performance of inference improves by more than 10% after using PIR. -- Multi-Hardware adaptation: PaddlePaddle provides a well-functioning and low-cost solution for large model hardware adaptation. The new hardware only needs to be adapted with more than 30 interfaces to support training, compression and inference of large models. Meanwhile, PaddlePaddle provides compiler-based hardware access mode, and hardware vendors only need to implement the compiler's code generation back-end in the form of plug-ins to achieve efficient adaptation with the PaddlePaddle framework.PaddlePaddle hardware access this time has additional support for the daily release of four hardware units: Kunlun XPU, Ascend NPU, Hygon DCU and Cambricon MLU. +- **Unified Static and Dynamic Automatic Parallelism:** This feature significantly reduces the cost of industrial development and training. Users only need to perform a small amount of tensor slicing marking on a single card, and the PaddlePaddle framework will automatically derive the distributed slicing information and add communication operators to ensure logical correctness. At the same time, based on the model structure and cluster information, combined with the optimization of memory and scheduling layers, PaddlePaddle can automatically find the most efficient distributed parallel strategy, thereby significantly reducing the development cost of hybrid parallel training and enabling developers to focus more on model and algorithm innovation. The automatic parallel architecture has undergone in-depth verification and polishing to better support the pre-training + fine-tuning process for common large model scenarios such as pure dense models, pure sparse models (MoE), and multi-modal understanding models. It improves the slicing derivation rules of operators and supports converting automatic parallel training parameters into manual parallel parameters for downstream inference, achieving comprehensive usability and helping users reduce the development cost of large model parallel programs. Additionally, to further simplify the user's distributed development process, a new `paddle.distributed.parallel` interface is introduced. Based on the encapsulation of distributed tensor marking syntax, it supports users in non-intrusively configuring common parallel strategies such as data parallelism, model parallelism, and pipeline parallelism outside of the model networking. Furthermore, the static graph automatic parallel architecture has undergone a comprehensive upgrade based on PIR, with the underlying basic components, core modules, parallel strategies, and performance optimization strategies all implemented uniformly based on the extended PIR `DistDialect`, further enhancing the consistency of automatic parallelism between static and dynamic states, and achieving performance levels on the Llama series models that are on par with or even surpass manual parallel methods. +- **Integrated Training and Inference for Large Models:** Since version 2.0, PaddlePaddle has adopted the design philosophy of "unified dynamic and static, integrated training and inference," and version 3.0 will continue to uphold this philosophy. Thanks to the unified dynamic and static architecture and interface design, PaddlePaddle fully supports both dynamic and static graph modes, and possesses excellent whole-graph export capabilities. The success rate of whole-graph export from dynamic to static in PaddlePaddle is as high as 95%, surpassing PyTorch's 62%. "Integrated training and inference" means being able to reuse training and inference code, especially model networking code, within the same framework. After completing the development and training of the model, only a small amount of development work is required to achieve rapid inference deployment. This feature provides an ultimate development experience for the industry. It enables the reuse of training and inference capabilities, providing a unified development experience and ultimate training efficiency for the entire process of large models. Through the work of transitioning from dynamic to static, the training and inference tasks can be seamlessly connected. It supports multiple mainstream large models, and the DeepSeek-R1 full-blood version achieves single-machine deployment with doubled throughput. +- **High-order differential in scientific computing:** PaddlePaddle Framework 3.0 provides support for high-order automatic differentiation, compilation optimization, and distributed training capabilities for scientific computing. Experiments on 41 different equations on NVIDIA Modulus show that the differential equation solving speed of PaddlePaddle is on average 115% faster than the version of PyTorch with compiler optimization enabled. Additionally, PaddlePaddle has also established the PaddleScience toolkit for solving general mathematical problems and the PaddleHelix toolkit focused on biological computing. Furthermore, PaddlePaddle Framework 3.0 natively supports complex technology systems, which is of great significance for data feature analysis in scenarios such as weather forecasting and aerodynamic analysis of automobiles and aircraft. +- **Neural Network Compiler:** This feature significantly reduces the cost of performance optimization. The compiler of PaddlePaddle adopts an integrated design with the framework, capable of supporting efficient training and variable-shape inference for various models such as generative models and scientific computing models, providing a good balance between computational flexibility and high performance. After using the CINN compiler, over 60% of the models have shown significant performance improvements, with an average increase of 27.4%. The CINN neural network compiler has comprehensive improvements in completeness and performance. In this version, we have comprehensively optimized the front-end and back-end aspects of the compiler: including adding an automatic Re-Compute mechanism for reverse computation graphs, front-end Pass performance optimization, upgrading the symbol derivation mechanism, optimizing operator fusion strategies, enhancing the back-end Schedule strategy and subscript expression simplification capabilities, etc. At the same time, we have investigated and fixed a large number of correctness and performance issues, systematically improving the general optimization capabilities of the compiler. +- **Heterogeneous Multi-Chips Adaptation:** One of the key features of PaddlePaddle is its ability to adapt to heterogeneous multi-core environments and fully leverage hardware potential. In terms of access mechanism, PaddlePaddle provides simple and efficient abstract interfaces and a basic operator system, reducing the cost of adaptation. In terms of operation mechanism, it optimizes scheduling and storage sharing mechanisms, enhancing scheduling efficiency. From the perspective of operator kernels, PaddlePaddle offers a compiler-based automatic fusion and tuning solution to improve end-to-end performance. Additionally, PaddlePaddle has established research and development infrastructure for new hardware vendors, including code integration, continuous integration, and model regression testing. These mechanisms ensure that new hardware is incorporated into PaddlePaddle's normal release system, allowing users to install and try it directly without the need for compilation. PaddlePaddle's comprehensive functionality and low-cost access mechanism have attracted hardware vendors to contribute a total of 4001 pull requests (PRs), encompassing 26584 commits. -This version includes the continuous improvement of some of the existing features of the framework 2.x. Meanwhile, the new features of this version bring significant improvements in terms of user experience, performance, ease of secondary development and hardware adaptability. In addition to the above core features, this version continues to enrich and enhance the API functions to meet more scenarios at the user experience level, optimizes and improves the distributed parallel strategy optimization and reasoning function enhancement for the large model scenarios, makes thorough improvement in terms of ease-of-use in compilation and installation, makes a new synchronous upgrade to the installation method and version of the dependency packages, strengthens the security of the system comprehensively, and makes comprehensive error-correction checks to the product documentation. We have also carried out a cleanup of some deprecated codes to ensure architectural simplicity. The performance of PaddlePaddle 3.0 Beta is still mature and stable without the use of new features, and each new feature provides a switch for flexible control, which makes it easy for users to quickly understand the related product features and experience comparison. +In addition to the above core features, **Highly Extensible Intermediate Representation** To enhance the scalability of the PaddlePaddle framework, we have developed the Highly Extensible Intermediate Representation (PIR), which systematically abstracts the underlying core concepts and provides flexible and efficient components. As an infrastructure, PIR supports multiple technologies such as dynamic-to-static, automatic differentiation, automatic parallelization, combinational operators, and graph optimization, and is widely used in distributed training, model compression, and inference deployment scenarios. Through the Declarative Rewrite Rule (DRR) mechanism provided by PIR, the development cost of Pass can be reduced by 60%. At the same time, PIR has been verified in all scenarios and is enabled by default, supporting one-click dynamic-to-static conversion, ensuring excellent performance and good scalability of the framework. Continuous improvements have been made to the existing functions of the framework version 2.0, and new features have brought significant improvements in user experience, performance, ease of secondary development, and hardware adaptability. This version continues to enrich and enhance the API functions to meet more scenarios at the user experience level. For large model scenarios, optimization and improvement have been made to the distributed parallel strategy optimization and inference function enhancement. Thorough usability improvements have been made in terms of compilation and installation, with a new synchronous upgrade of the installation method and version of dependent packages. Comprehensive reinforcement of system security has been carried out, and comprehensive error correction checks have been conducted on product documentation. At the same time, a large amount of cleanup has been done on some obsolete code to ensure the simplicity of the architecture. -## User Experience Upgrade +## Incompatible upgrade -### Incompatibility Upgrade +PaddlePaddle API supports implicit type promotion. In the most commonly used calculations such as addition, subtraction, multiplication, and division, if the data types of the two inputs are different, it is necessary to determine the data type of the output. Historically, PaddlePaddle has only partially supported implicit type promotion, and the actual rules are unclear. Objectively, this manifests as inconsistencies between dynamic and static graphs, inconsistencies between API and operator overloading, and non-compliance with commutativity. Especially when large models widely use mixed calculations with bf16/fp16 and fp32, unexpected issues are prone to occur and are difficult to locate. Starting from the 3.0 beta version, PaddlePaddle has clarified the [implicit data type promotion rules](https://www.paddlepaddle.org.cn/documentation/docs/zh/develop/guides/advanced/auto_type_promotion_cn.html), which defines in detail the types of calculation results for Tensor and Tensor, as well as Tensor and a scalar (Scalar), ensuring that calculations comply with commutativity, operator overloading is consistent with binary API results, and dynamic graphs and static graphs produce consistent results. This is more in line with user understanding and industry habits. [#60638](https://github.com/PaddlePaddle/Paddle/pull/60638), [#63842](https://github.com/PaddlePaddle/Paddle/pull/63842), [#60011](https://github.com/PaddlePaddle/Paddle/pull/60011) -- PaddlePaddle API supports type promotion.In the most common calculations such as addition, subtraction, multiplication, and division, if the two inputs are of different data types, it is necessary to determine the data type of the output. Historically, PaddlePaddle partially supported this and the actual rules were not clear. Objectively, there were dynamic-static inconsistency, inconsistent API and operator overloading, and inconsistent interchange rates, and unexpected problems (hard to fix) especially in the case of large models using a mix of bf16/fp16 and fp32 for a wide range of calculations. Starting from the 3.0 beta, PaddlePaddle has clarified the [type promotion rules](https://www.paddlepaddle.org.cn/documentation/docs/zh/develop/guides/advanced/auto_type_promotion_cn.html), and defined in detail the types of Tensor vs Tensor and Tensor vs. 1 number (Scalar) computation results, ensuring that the computation conforms to the exchange law, the operator overloading is consistent with the results of the binary API, and the results of the dynamic graph are consistent with those of the static graph. This is more in line with user understanding and industry practice. [#60638](https://github.com/PaddlePaddle/Paddle/pull/60638), [#63842](https://github.com/PaddlePaddle/Paddle/pull/63842), [#60011](https://github.com/PaddlePaddle/Paddle/pull/60011) +## Discontinued Features -### Deprecated Features +Support for 0-dimensional Tensor has been stable for two versions. In this version, the switch FLAGS_set_to_1d, which converts 0-dimensional Tensor to a 1-dimensional Tensor containing only one element in some cases, has been removed. This switch is to accommodate incorrect writing in some suites where 0-dimensional Tensor is represented by a 1-dimensional Tensor containing only one element. That is, PaddlePaddle now fully distinguishes between the semantics of 0-dimensional Tensor and 1-dimensional Tensor containing only one element, and the two are not equivalent. [#61227](https://github.com/PaddlePaddle/Paddle/pull/61227) -- There have been two versions stably supporting 0-dimensional Tensor. This version removes the switch `FLAGS_set_to_1d` that converts a 0-dimensional Tensor to a 1-dimensional Tensor with only 1 element in some cases. This switch is for compatibility with the incorrect way of writing a 1-element 1-dimensional Tensor to represent a 0-dimensional Tensor in some kits. That is, the current PaddlePaddle fully distinguish between the semantics of a 0-dimensional Tensor and a 1-dimensional Tensor with only 1 element, both are not equivalent. [#61227](https://github.com/PaddlePaddle/Paddle/pull/61227) +## 1. User experience upgrade -### New API Features +### New Features -Compared with the previous version, this version is added with 126 new APIs, richer API functions to better support the needs of large models, and scientific computation. The details are as follows: +- Added PaddlePaddle APIs to expand PaddlePaddle's functionality. These include `paddle.nn.FeatureAlphaDropout`, `paddle.cartesian_prod`, `paddle.distributed.to_distributed`, `paddle.pi`, etc. [#64881](https://github.com/PaddlePaddle/Paddle/pull/64881), [#65605](https://github.com/PaddlePaddle/Paddle/pull/65605), [#70757](https://github.com/PaddlePaddle/Paddle/pull/70757), [#71030](https://github.com/PaddlePaddle/Paddle/pull/71030), [#69946](https://github.com/PaddlePaddle/Paddle/pull/69946), [#70021](https://github.com/PaddlePaddle/Paddle/pull/70021), [#69613](https://github.com/PaddlePaddle/Paddle/pull/69613), [#68123](https://github.com/PaddlePaddle/Paddle/pull/68123), [#70032](https://github.com/PaddlePaddle/Paddle/pull/70032) +- Introduce new Tensor class methods and attributes, along with corresponding unit tests, to enhance the usability of Tensor. [#68334](https://github.com/PaddlePaddle/Paddle/pull/68334), [#68681](https://github.com/PaddlePaddle/Paddle/pull/68681), [#69132](https://github.com/PaddlePaddle/Paddle/pull/69132), [#69270](https://github.com/PaddlePaddle/Paddle/pull/69270), [#69256](https://github.com/PaddlePaddle/Paddle/pull/69256), [#69197](https://github.com/PaddlePaddle/Paddle/pull/69197), [#69231](https://github.com/PaddlePaddle/Paddle/pull/69231), [#69222](https://github.com/PaddlePaddle/Paddle/pull/69222), [#69257](https://github.com/PaddlePaddle/Paddle/pull/69257), [#69301](https://github.com/PaddlePaddle/Paddle/pull/69301), [#69361](https://github.com/PaddlePaddle/Paddle/pull/69361), [#69348](https://github.com/PaddlePaddle/Paddle/pull/69348), [#69464](https://github.com/PaddlePaddle/Paddle/pull/69464), [#69542](https://github.com/PaddlePaddle/Paddle/pull/69542), [#69667](https://github.com/PaddlePaddle/Paddle/pull/69667), [#69563](https://github.com/PaddlePaddle/Paddle/pull/69563), [#69796](https://github.com/PaddlePaddle/Paddle/pull/69796), [#69477](https://github.com/PaddlePaddle/Paddle/pull/69477), [#69779](https://github.com/PaddlePaddle/Paddle/pull/69779), [#69724](https://github.com/PaddlePaddle/Paddle/pull/69724), [#69835](https://github.com/PaddlePaddle/Paddle/pull/69835), [#69781](https://github.com/PaddlePaddle/Paddle/pull/69781), [#69982](https://github.com/PaddlePaddle/Paddle/pull/69982), [#69913](https://github.com/PaddlePaddle/Paddle/pull/69913), [#70026](https://github.com/PaddlePaddle/Paddle/pull/70026), [#70013](https://github.com/PaddlePaddle/Paddle/pull/70013), [#69539](https://github.com/PaddlePaddle/Paddle/pull/69539), [#69736](https://github.com/PaddlePaddle/Paddle/pull/69736), [#69841](https://github.com/PaddlePaddle/Paddle/pull/69841), [#70277](https://github.com/PaddlePaddle/Paddle/pull/70277), [#69580](https://github.com/PaddlePaddle/Paddle/pull/69580), [#69599](https://github.com/PaddlePaddle/Paddle/pull/69599), [#69693](https://github.com/PaddlePaddle/Paddle/pull/69693), [#69848](https://github.com/PaddlePaddle/Paddle/pull/69848), [#69751](https://github.com/PaddlePaddle/Paddle/pull/69751), [#70556](https://github.com/PaddlePaddle/Paddle/pull/70556), [#70591](https://github.com/PaddlePaddle/Paddle/pull/70591), [#69673](https://github.com/PaddlePaddle/Paddle/pull/69673), [#70647](https://github.com/PaddlePaddle/Paddle/pull/70647), [#68192](https://github.com/PaddlePaddle/Paddle/pull/68192), [#68511](https://github.com/PaddlePaddle/Paddle/pull/68511), [#68833](https://github.com/PaddlePaddle/Paddle/pull/68833), [#69406](https://github.com/PaddlePaddle/Paddle/pull/69406), [#69480](https://github.com/PaddlePaddle/Paddle/pull/69480), [#69463](https://github.com/PaddlePaddle/Paddle/pull/69463), [#69632](https://github.com/PaddlePaddle/Paddle/pull/69632), [#69473](https://github.com/PaddlePaddle/Paddle/pull/69473), [#68694](https://github.com/PaddlePaddle/Paddle/pull/68694), [#69534](https://github.com/PaddlePaddle/Paddle/pull/69534), [#69820](https://github.com/PaddlePaddle/Paddle/pull/69820), [#70121](https://github.com/PaddlePaddle/Paddle/pull/70121) -- Add Tensor computation API. `paddle.gammaln`, `paddle.gammainc`, `paddle.gammaincc`, `paddle.sinc`, `paddle.pdist`, `paddle.histogramdd`,`paddle.signbit`, `paddle.copysign`, `paddle.bitwise_right_shift/bitwise_left_shift`, `paddle.isposinf/isneginf/isreal`, `paddle.isin`, `paddle.hsplit/dsplit`, `paddle.column_stack/row_stack/dstack/hstack/vstack`, `paddle.slice_scatter`, `paddle.masked_scatter` [#60553](https://github.com/PaddlePaddle/Paddle/pull/60553), [#59311](https://github.com/PaddlePaddle/Paddle/pull/59311), [#59357](https://github.com/PaddlePaddle/Paddle/pull/59357), [#63521](https://github.com/PaddlePaddle/Paddle/pull/63521), [#57869](https://github.com/PaddlePaddle/Paddle/pull/57869), [#57880](https://github.com/PaddlePaddle/Paddle/pull/57880), [#57882](https://github.com/PaddlePaddle/Paddle/pull/57882), [#60150](https://github.com/PaddlePaddle/Paddle/pull/60150), [#57785](https://github.com/PaddlePaddle/Paddle/pull/57785), [#58092](https://github.com/PaddlePaddle/Paddle/pull/58092), [#63523](https://github.com/PaddlePaddle/Paddle/pull/63523), [#64001](https://github.com/PaddlePaddle/Paddle/pull/64001), [#58917](https://github.com/PaddlePaddle/Paddle/pull/58917), [#59127](https://github.com/PaddlePaddle/Paddle/pull/59127), [#59973](https://github.com/PaddlePaddle/Paddle/pull/59973), [#59383](https://github.com/PaddlePaddle/Paddle/pull/59383) -- Add probability distribution API. `paddle.distribution.ContinuousBernoulli`, `paddle.distribution.MultivariateNormal`, `paddle.distribution.Exponential`, `paddle.distribution.Gamma`, `paddle.distribution.Binomial`, `paddle.distribution.Poisson` [#58004](https://github.com/PaddlePaddle/Paddle/pull/58004), [#57899](https://github.com/PaddlePaddle/Paddle/pull/57899), [#57856](https://github.com/PaddlePaddle/Paddle/pull/57856) -- Add optimizer API. `paddle.optimizer.ASGD`, `paddle.optimizer.NAdam`, `paddle.optimizer.RAdam`, `paddle.optimizer.Rprop` [#58834](https://github.com/PaddlePaddle/Paddle/pull/58834), [#63671](https://github.com/PaddlePaddle/Paddle/pull/63671), [#58851](https://github.com/PaddlePaddle/Paddle/pull/58851) -- Add Linear Algebra API. `paddle.linalg.matrix_exp` [#59715](https://github.com/PaddlePaddle/Paddle/pull/59715) -- Add other APIs. `paddle.bernoulli_`, `paddle.nn.ZeroPad1D/ZeroPad3D`, `paddle.nn.AdaptiveLogSoftmaxWithLoss`, `paddle.Tensor.apply` [#64252](https://github.com/PaddlePaddle/Paddle/pull/64252), [#59690](https://github.com/PaddlePaddle/Paddle/pull/59690), [#63728](https://github.com/PaddlePaddle/Paddle/pull/63728), [#63302](https://github.com/PaddlePaddle/Paddle/pull/63302), [#59374](https://github.com/PaddlePaddle/Paddle/pull/59374),[#63227](https://github.com/PaddlePaddle/Paddle/pull/63227) +### API Function Enhancement -### Some API Enhancements +- Enhanced the functionality of 43 APIs, making existing APIs easier to use and facilitating code conversion. This includes but is not limited to adding API parameters, expanding the data types supported by APIs, and correcting existing unreasonable designs. [#65105](https://github.com/PaddlePaddle/Paddle/pull/65105), [#65103](https://github.com/PaddlePaddle/Paddle/pull/65103), [#62975](https://github.com/PaddlePaddle/Paddle/pull/62975), [#64436](https://github.com/PaddlePaddle/Paddle/pull/64436), [#63346](https://github.com/PaddlePaddle/Paddle/pull/63346), [#68079](https://github.com/PaddlePaddle/Paddle/pull/68079), [#67878](https://github.com/PaddlePaddle/Paddle/pull/67878), [#68432](https://github.com/PaddlePaddle/Paddle/pull/68432), [#68677](https://github.com/PaddlePaddle/Paddle/pull/68677), [#69012](https://github.com/PaddlePaddle/Paddle/pull/69012), [#69385](https://github.com/PaddlePaddle/Paddle/pull/69385), [#65032](https://github.com/PaddlePaddle/Paddle/pull/65032), [#64977](https://github.com/PaddlePaddle/Paddle/pull/64977), [#67071](https://github.com/PaddlePaddle/Paddle/pull/67071), [#67298](https://github.com/PaddlePaddle/Paddle/pull/67298), [#66687](https://github.com/PaddlePaddle/Paddle/pull/66687), [#65946](https://github.com/PaddlePaddle/Paddle/pull/65946), [#66170](https://github.com/PaddlePaddle/Paddle/pull/66170), [#66929](https://github.com/PaddlePaddle/Paddle/pull/66929), [#67994](https://github.com/PaddlePaddle/Paddle/pull/67994), [#67947](https://github.com/PaddlePaddle/Paddle/pull/67947), [#68033](https://github.com/PaddlePaddle/Paddle/pull/68033), [#68046](https://github.com/PaddlePaddle/Paddle/pull/68046), [#68294](https://github.com/PaddlePaddle/Paddle/pull/68294), [#68214](https://github.com/PaddlePaddle/Paddle/pull/68214), [#68281](https://github.com/PaddlePaddle/Paddle/pull/68281), [#68390](https://github.com/PaddlePaddle/Paddle/pull/68390), [#68772](https://github.com/PaddlePaddle/Paddle/pull/68772), [#69451](https://github.com/PaddlePaddle/Paddle/pull/69451), [#69252](https://github.com/PaddlePaddle/Paddle/pull/69252), [#69529](https://github.com/PaddlePaddle/Paddle/pull/69529), [#69750](https://github.com/PaddlePaddle/Paddle/pull/69750), [#69827](https://github.com/PaddlePaddle/Paddle/pull/69827), [#69099](https://github.com/PaddlePaddle/Paddle/pull/69099), [#68594](https://github.com/PaddlePaddle/Paddle/pull/68594), [#70090](https://github.com/PaddlePaddle/Paddle/pull/70090), [#70228](https://github.com/PaddlePaddle/Paddle/pull/70228), [#70166](https://github.com/PaddlePaddle/Paddle/pull/70166), [#70389](https://github.com/PaddlePaddle/Paddle/pull/70389), [#70790](https://github.com/PaddlePaddle/Paddle/pull/70790), [#71029](https://github.com/PaddlePaddle/Paddle/pull/71029), [#71283](https://github.com/PaddlePaddle/Paddle/pull/71283), [#71342](https://github.com/PaddlePaddle/Paddle/pull/71342) +- PaddlePaddle Python API fully supports type hints. All parameters and return values of Python API have been annotated with type hints for ease of development and use. [#65209](https://github.com/PaddlePaddle/Paddle/pull/65209), [#65201](https://github.com/PaddlePaddle/Paddle/pull/65201), [#65190](https://github.com/PaddlePaddle/Paddle/pull/65190), [#65082](https://github.com/PaddlePaddle/Paddle/pull/65082), [#65226](https://github.com/PaddlePaddle/Paddle/pull/65226), [#65076](https://github.com/PaddlePaddle/Paddle/pull/65076), [#65238](https://github.com/PaddlePaddle/Paddle/pull/65238), [#65236](https://github.com/PaddlePaddle/Paddle/pull/65236), [#65247](https://github.com/PaddlePaddle/Paddle/pull/65247), [#65249](https://github.com/PaddlePaddle/Paddle/pull/65249), [#65244](https://github.com/PaddlePaddle/Paddle/pull/65244), [#65272](https://github.com/PaddlePaddle/Paddle/pull/65272), [#65191](https://github.com/PaddlePaddle/Paddle/pull/65191), [#65290](https://github.com/PaddlePaddle/Paddle/pull/65290), [#65255](https://github.com/PaddlePaddle/Paddle/pull/65255), [#65292](https://github.com/PaddlePaddle/Paddle/pull/65292), [#65300](https://github.com/PaddlePaddle/Paddle/pull/65300), [#65301](https://github.com/PaddlePaddle/Paddle/pull/65301), [#65332](https://github.com/PaddlePaddle/Paddle/pull/65332), [#65323](https://github.com/PaddlePaddle/Paddle/pull/65323), [#65326](https://github.com/PaddlePaddle/Paddle/pull/65326), [#65273](https://github.com/PaddlePaddle/Paddle/pull/65273), [#65317](https://github.com/PaddlePaddle/Paddle/pull/65317), [#65354](https://github.com/PaddlePaddle/Paddle/pull/65354), [#65283](https://github.com/PaddlePaddle/Paddle/pull/65283), [#65372](https://github.com/PaddlePaddle/Paddle/pull/65372), [#65337](https://github.com/PaddlePaddle/Paddle/pull/65337), [#65085](https://github.com/PaddlePaddle/Paddle/pull/65085), [#65382](https://github.com/PaddlePaddle/Paddle/pull/65382), [#65381](https://github.com/PaddlePaddle/Paddle/pull/65381), [#65378](https://github.com/PaddlePaddle/Paddle/pull/65378), [#65274](https://github.com/PaddlePaddle/Paddle/pull/65274), [#65380](https://github.com/PaddlePaddle/Paddle/pull/65380), [#65386](https://github.com/PaddlePaddle/Paddle/pull/65386), [#65351](https://github.com/PaddlePaddle/Paddle/pull/65351), [#65284](https://github.com/PaddlePaddle/Paddle/pull/65284), [#65366](https://github.com/PaddlePaddle/Paddle/pull/65366), [#65308](https://github.com/PaddlePaddle/Paddle/pull/65308), [#65375](https://github.com/PaddlePaddle/Paddle/pull/65375), [#65376](https://github.com/PaddlePaddle/Paddle/pull/65376), [#65464](https://github.com/PaddlePaddle/Paddle/pull/65464), [#65197](https://github.com/PaddlePaddle/Paddle/pull/65197), [#65455](https://github.com/PaddlePaddle/Paddle/pull/65455), [#65457](https://github.com/PaddlePaddle/Paddle/pull/65457), [#65487](https://github.com/PaddlePaddle/Paddle/pull/65487), [#65486](https://github.com/PaddlePaddle/Paddle/pull/65486), [#65547](https://github.com/PaddlePaddle/Paddle/pull/65547), [#65504](https://github.com/PaddlePaddle/Paddle/pull/65504), [#65460](https://github.com/PaddlePaddle/Paddle/pull/65460), [#65183](https://github.com/PaddlePaddle/Paddle/pull/65183), [#65454](https://github.com/PaddlePaddle/Paddle/pull/65454), [#65559](https://github.com/PaddlePaddle/Paddle/pull/65559), [#65560](https://github.com/PaddlePaddle/Paddle/pull/65560), [#65570](https://github.com/PaddlePaddle/Paddle/pull/65570), [#65569](https://github.com/PaddlePaddle/Paddle/pull/65569), [#65566](https://github.com/PaddlePaddle/Paddle/pull/65566), [#65620](https://github.com/PaddlePaddle/Paddle/pull/65620), [#65568](https://github.com/PaddlePaddle/Paddle/pull/65568), [#65567](https://github.com/PaddlePaddle/Paddle/pull/65567), [#65660](https://github.com/PaddlePaddle/Paddle/pull/65660), [#65645](https://github.com/PaddlePaddle/Paddle/pull/65645), [#65600](https://github.com/PaddlePaddle/Paddle/pull/65600), [#65532](https://github.com/PaddlePaddle/Paddle/pull/65532), [#65765](https://github.com/PaddlePaddle/Paddle/pull/65765), [#65767](https://github.com/PaddlePaddle/Paddle/pull/65767), [#65770](https://github.com/PaddlePaddle/Paddle/pull/65770), [#65768](https://github.com/PaddlePaddle/Paddle/pull/65768), [#65771](https://github.com/PaddlePaddle/Paddle/pull/65771), [#65772](https://github.com/PaddlePaddle/Paddle/pull/65772), [#65774](https://github.com/PaddlePaddle/Paddle/pull/65774), [#65769](https://github.com/PaddlePaddle/Paddle/pull/65769), [#65773](https://github.com/PaddlePaddle/Paddle/pull/65773), [#65766](https://github.com/PaddlePaddle/Paddle/pull/65766), [#65776](https://github.com/PaddlePaddle/Paddle/pull/65776), [#65775](https://github.com/PaddlePaddle/Paddle/pull/65775), [#65755](https://github.com/PaddlePaddle/Paddle/pull/65755), [#65779](https://github.com/PaddlePaddle/Paddle/pull/65779), [#65777](https://github.com/PaddlePaddle/Paddle/pull/65777), [#65823](https://github.com/PaddlePaddle/Paddle/pull/65823), [#65807](https://github.com/PaddlePaddle/Paddle/pull/65807), [#65821](https://github.com/PaddlePaddle/Paddle/pull/65821), [#65819](https://github.com/PaddlePaddle/Paddle/pull/65819), [#65810](https://github.com/PaddlePaddle/Paddle/pull/65810), [#65808](https://github.com/PaddlePaddle/Paddle/pull/65808), [#65824](https://github.com/PaddlePaddle/Paddle/pull/65824), [#65553](https://github.com/PaddlePaddle/Paddle/pull/65553), [#65818](https://github.com/PaddlePaddle/Paddle/pull/65818), [#65812](https://github.com/PaddlePaddle/Paddle/pull/65812), [#65803](https://github.com/PaddlePaddle/Paddle/pull/65803), [#65865](https://github.com/PaddlePaddle/Paddle/pull/65865), [#65870](https://github.com/PaddlePaddle/Paddle/pull/65870), [#65866](https://github.com/PaddlePaddle/Paddle/pull/65866), [#65844](https://github.com/PaddlePaddle/Paddle/pull/65844), [#65845](https://github.com/PaddlePaddle/Paddle/pull/65845), [#65853](https://github.com/PaddlePaddle/Paddle/pull/65853), [#65874](https://github.com/PaddlePaddle/Paddle/pull/65874), [#65871](https://github.com/PaddlePaddle/Paddle/pull/65871), [#65809](https://github.com/PaddlePaddle/Paddle/pull/65809), [#65867](https://github.com/PaddlePaddle/Paddle/pull/65867), [#65822](https://github.com/PaddlePaddle/Paddle/pull/65822), [#65872](https://github.com/PaddlePaddle/Paddle/pull/65872), [#65873](https://github.com/PaddlePaddle/Paddle/pull/65873), [#65869](https://github.com/PaddlePaddle/Paddle/pull/65869), [#65868](https://github.com/PaddlePaddle/Paddle/pull/65868), [#65849](https://github.com/PaddlePaddle/Paddle/pull/65849), [#65875](https://github.com/PaddlePaddle/Paddle/pull/65875), [#65876](https://github.com/PaddlePaddle/Paddle/pull/65876), [#65843](https://github.com/PaddlePaddle/Paddle/pull/65843), [#65727](https://github.com/PaddlePaddle/Paddle/pull/65727), [#65587](https://github.com/PaddlePaddle/Paddle/pull/65587), [#66006](https://github.com/PaddlePaddle/Paddle/pull/66006), [#66005](https://github.com/PaddlePaddle/Paddle/pull/66005), [#65785](https://github.com/PaddlePaddle/Paddle/pull/65785), [#65784](https://github.com/PaddlePaddle/Paddle/pull/65784), [#65811](https://github.com/PaddlePaddle/Paddle/pull/65811), [#65919](https://github.com/PaddlePaddle/Paddle/pull/65919), [#65838](https://github.com/PaddlePaddle/Paddle/pull/65838), [#65852](https://github.com/PaddlePaddle/Paddle/pull/65852), [#65847](https://github.com/PaddlePaddle/Paddle/pull/65847), [#66014](https://github.com/PaddlePaddle/Paddle/pull/66014), [#65805](https://github.com/PaddlePaddle/Paddle/pull/65805), [#66009](https://github.com/PaddlePaddle/Paddle/pull/66009), [#66012](https://github.com/PaddlePaddle/Paddle/pull/66012), [#65633](https://github.com/PaddlePaddle/Paddle/pull/65633), [#66011](https://github.com/PaddlePaddle/Paddle/pull/66011), [#66010](https://github.com/PaddlePaddle/Paddle/pull/66010), [#66013](https://github.com/PaddlePaddle/Paddle/pull/66013), [#66015](https://github.com/PaddlePaddle/Paddle/pull/66015), [#66016](https://github.com/PaddlePaddle/Paddle/pull/66016), [#66030](https://github.com/PaddlePaddle/Paddle/pull/66030), [#66028](https://github.com/PaddlePaddle/Paddle/pull/66028), [#66029](https://github.com/PaddlePaddle/Paddle/pull/66029), [#66054](https://github.com/PaddlePaddle/Paddle/pull/66054), [#66040](https://github.com/PaddlePaddle/Paddle/pull/66040), [#65993](https://github.com/PaddlePaddle/Paddle/pull/65993), [#66058](https://github.com/PaddlePaddle/Paddle/pull/66058), [#66280](https://github.com/PaddlePaddle/Paddle/pull/66280), [#66037](https://github.com/PaddlePaddle/Paddle/pull/66037), [#66057](https://github.com/PaddlePaddle/Paddle/pull/66057), [#66077](https://github.com/PaddlePaddle/Paddle/pull/66077), [#66051](https://github.com/PaddlePaddle/Paddle/pull/66051), [#65912](https://github.com/PaddlePaddle/Paddle/pull/65912), [#66090](https://github.com/PaddlePaddle/Paddle/pull/66090), [#66189](https://github.com/PaddlePaddle/Paddle/pull/66189), [#66127](https://github.com/PaddlePaddle/Paddle/pull/66127), [#66277](https://github.com/PaddlePaddle/Paddle/pull/66277), [#66119](https://github.com/PaddlePaddle/Paddle/pull/66119), [#66270](https://github.com/PaddlePaddle/Paddle/pull/66270), [#66305](https://github.com/PaddlePaddle/Paddle/pull/66305), [#66306](https://github.com/PaddlePaddle/Paddle/pull/66306), [#66279](https://github.com/PaddlePaddle/Paddle/pull/66279), [#66276](https://github.com/PaddlePaddle/Paddle/pull/66276), [#66295](https://github.com/PaddlePaddle/Paddle/pull/66295), [#66301](https://github.com/PaddlePaddle/Paddle/pull/66301), [#66473](https://github.com/PaddlePaddle/Paddle/pull/66473), [#66384](https://github.com/PaddlePaddle/Paddle/pull/66384), [#66505](https://github.com/PaddlePaddle/Paddle/pull/66505), [#66328](https://github.com/PaddlePaddle/Paddle/pull/66328), [#66394](https://github.com/PaddlePaddle/Paddle/pull/66394), [#66392](https://github.com/PaddlePaddle/Paddle/pull/66392), [#66432](https://github.com/PaddlePaddle/Paddle/pull/66432), [#66575](https://github.com/PaddlePaddle/Paddle/pull/66575), [#66572](https://github.com/PaddlePaddle/Paddle/pull/66572), [#66656](https://github.com/PaddlePaddle/Paddle/pull/66656), [#66475](https://github.com/PaddlePaddle/Paddle/pull/66475), [#66654](https://github.com/PaddlePaddle/Paddle/pull/66654), [#66616](https://github.com/PaddlePaddle/Paddle/pull/66616), [#66694](https://github.com/PaddlePaddle/Paddle/pull/66694), [#66686](https://github.com/PaddlePaddle/Paddle/pull/66686), [#66766](https://github.com/PaddlePaddle/Paddle/pull/66766), [#66749](https://github.com/PaddlePaddle/Paddle/pull/66749), [#66760](https://github.com/PaddlePaddle/Paddle/pull/66760), [#66803](https://github.com/PaddlePaddle/Paddle/pull/66803), [#66770](https://github.com/PaddlePaddle/Paddle/pull/66770), [#66693](https://github.com/PaddlePaddle/Paddle/pull/66693), [#66771](https://github.com/PaddlePaddle/Paddle/pull/66771), [#66792](https://github.com/PaddlePaddle/Paddle/pull/66792), [#66862](https://github.com/PaddlePaddle/Paddle/pull/66862), [#66867](https://github.com/PaddlePaddle/Paddle/pull/66867), [#66684](https://github.com/PaddlePaddle/Paddle/pull/66684), [#66966](https://github.com/PaddlePaddle/Paddle/pull/66966), [#66793](https://github.com/PaddlePaddle/Paddle/pull/66793), [#66987](https://github.com/PaddlePaddle/Paddle/pull/66987), [#66985](https://github.com/PaddlePaddle/Paddle/pull/66985), [#66989](https://github.com/PaddlePaddle/Paddle/pull/66989), [#66639](https://github.com/PaddlePaddle/Paddle/pull/66639), [#66994](https://github.com/PaddlePaddle/Paddle/pull/66994), [#66986](https://github.com/PaddlePaddle/Paddle/pull/66986), [#66993](https://github.com/PaddlePaddle/Paddle/pull/66993), [#67002](https://github.com/PaddlePaddle/Paddle/pull/67002), [#66996](https://github.com/PaddlePaddle/Paddle/pull/66996), [#67001](https://github.com/PaddlePaddle/Paddle/pull/67001), [#66864](https://github.com/PaddlePaddle/Paddle/pull/66864), [#67031](https://github.com/PaddlePaddle/Paddle/pull/67031), [#67089](https://github.com/PaddlePaddle/Paddle/pull/67089), [#67143](https://github.com/PaddlePaddle/Paddle/pull/67143), [#67179](https://github.com/PaddlePaddle/Paddle/pull/67179), [#67178](https://github.com/PaddlePaddle/Paddle/pull/67178), [#67284](https://github.com/PaddlePaddle/Paddle/pull/67284), [#67104](https://github.com/PaddlePaddle/Paddle/pull/67104), [#67079](https://github.com/PaddlePaddle/Paddle/pull/67079), [#67132](https://github.com/PaddlePaddle/Paddle/pull/67132), [#67147](https://github.com/PaddlePaddle/Paddle/pull/67147), [#67204](https://github.com/PaddlePaddle/Paddle/pull/67204), [#67112](https://github.com/PaddlePaddle/Paddle/pull/67112), [#67233](https://github.com/PaddlePaddle/Paddle/pull/67233), [#67366](https://github.com/PaddlePaddle/Paddle/pull/67366), [#67067](https://github.com/PaddlePaddle/Paddle/pull/67067), [#67391](https://github.com/PaddlePaddle/Paddle/pull/67391), [#67428](https://github.com/PaddlePaddle/Paddle/pull/67428), [#67197](https://github.com/PaddlePaddle/Paddle/pull/67197), [#67047](https://github.com/PaddlePaddle/Paddle/pull/67047), [#66890](https://github.com/PaddlePaddle/Paddle/pull/66890), [#67159](https://github.com/PaddlePaddle/Paddle/pull/67159), [#67439](https://github.com/PaddlePaddle/Paddle/pull/67439), [#67555](https://github.com/PaddlePaddle/Paddle/pull/67555), [#67448](https://github.com/PaddlePaddle/Paddle/pull/67448), [#67556](https://github.com/PaddlePaddle/Paddle/pull/67556), [#67469](https://github.com/PaddlePaddle/Paddle/pull/67469), [#67558](https://github.com/PaddlePaddle/Paddle/pull/67558), [#67405](https://github.com/PaddlePaddle/Paddle/pull/67405), [#67644](https://github.com/PaddlePaddle/Paddle/pull/67644), [#67624](https://github.com/PaddlePaddle/Paddle/pull/67624), [#67679](https://github.com/PaddlePaddle/Paddle/pull/67679), [#67677](https://github.com/PaddlePaddle/Paddle/pull/67677), [#67785](https://github.com/PaddlePaddle/Paddle/pull/67785), [#67767](https://github.com/PaddlePaddle/Paddle/pull/67767), [#65319](https://github.com/PaddlePaddle/Paddle/pull/65319), [#65277](https://github.com/PaddlePaddle/Paddle/pull/65277), [#67673](https://github.com/PaddlePaddle/Paddle/pull/67673), [#65557](https://github.com/PaddlePaddle/Paddle/pull/65557), [#67527](https://github.com/PaddlePaddle/Paddle/pull/67527), [#66965](https://github.com/PaddlePaddle/Paddle/pull/66965), [#65905](https://github.com/PaddlePaddle/Paddle/pull/65905), [#65657](https://github.com/PaddlePaddle/Paddle/pull/65657), [#66357](https://github.com/PaddlePaddle/Paddle/pull/66357), [#68163](https://github.com/PaddlePaddle/Paddle/pull/68163) +- Optimized the error messages of many PaddlePaddle APIs, making the errors more understandable. [#67148](https://github.com/PaddlePaddle/Paddle/pull/67148), [#67154](https://github.com/PaddlePaddle/Paddle/pull/67154), [#67546](https://github.com/PaddlePaddle/Paddle/pull/67546), [#67335](https://github.com/PaddlePaddle/Paddle/pull/67335), [#67255](https://github.com/PaddlePaddle/Paddle/pull/67255), [#67099](https://github.com/PaddlePaddle/Paddle/pull/67099), [#67074](https://github.com/PaddlePaddle/Paddle/pull/67074), [#67073](https://github.com/PaddlePaddle/Paddle/pull/67073), [#66957](https://github.com/PaddlePaddle/Paddle/pull/66957), [#67063](https://github.com/PaddlePaddle/Paddle/pull/67063), [#67575](https://github.com/PaddlePaddle/Paddle/pull/67575), [#67608](https://github.com/PaddlePaddle/Paddle/pull/67608), [#67634](https://github.com/PaddlePaddle/Paddle/pull/67634), [#67325](https://github.com/PaddlePaddle/Paddle/pull/67325), [#67429](https://github.com/PaddlePaddle/Paddle/pull/67429), [#67401](https://github.com/PaddlePaddle/Paddle/pull/67401), [#66881](https://github.com/PaddlePaddle/Paddle/pull/66881), [#68492](https://github.com/PaddlePaddle/Paddle/pull/68492), [#67695](https://github.com/PaddlePaddle/Paddle/pull/67695), [#69833](https://github.com/PaddlePaddle/Paddle/pull/69833), [#70398](https://github.com/PaddlePaddle/Paddle/pull/70398) -- Enhance about 30 APIs to support complex number computation, such as `paddle.log`, `paddle.log1p`, `paddle.square`, and `paddle.reciprocal`, to extend the support for more scientific computing scenarios. [#62448](https://github.com/PaddlePaddle/Paddle/pull/62448), [#60821](https://github.com/PaddlePaddle/Paddle/pull/60821), [#60897](https://github.com/PaddlePaddle/Paddle/pull/60897), [#62764](https://github.com/PaddlePaddle/Paddle/pull/62764), [#59536](https://github.com/PaddlePaddle/Paddle/pull/59536), [#59529](https://github.com/PaddlePaddle/Paddle/pull/59529), [#63207](https://github.com/PaddlePaddle/Paddle/pull/63207), [#62237](https://github.com/PaddlePaddle/Paddle/pull/62237), [#64684](https://github.com/PaddlePaddle/Paddle/pull/64684) -- Enhance 46 APIs, to make existing APIs easier to use and easier to convert to codes,including but not limited to, adding API parameters, extending the data types supported by the APIs, and fixing the existing unreasonable designs. [#59890](https://github.com/PaddlePaddle/Paddle/pull/59890), [#63513](https://github.com/PaddlePaddle/Paddle/pull/63513), [#59674](https://github.com/PaddlePaddle/Paddle/pull/59674), [#62778](https://github.com/PaddlePaddle/Paddle/pull/62778), [#64110](https://github.com/PaddlePaddle/Paddle/pull/64110), [#63222](https://github.com/PaddlePaddle/Paddle/pull/63222), [#64331](https://github.com/PaddlePaddle/Paddle/pull/64331), [#64715](https://github.com/PaddlePaddle/Paddle/pull/64715), [#61155](https://github.com/PaddlePaddle/Paddle/pull/61155), [#60070](https://github.com/PaddlePaddle/Paddle/pull/60070), [#61974](https://github.com/PaddlePaddle/Paddle/pull/61974), [#62407](https://github.com/PaddlePaddle/Paddle/pull/62407), [#62672](https://github.com/PaddlePaddle/Paddle/pull/62672),[#62722](https://github.com/PaddlePaddle/Paddle/pull/62722), [#62876](https://github.com/PaddlePaddle/Paddle/pull/62876), [#63284](https://github.com/PaddlePaddle/Paddle/pull/63284), [#63860](https://github.com/PaddlePaddle/Paddle/pull/63860), [#60466](https://github.com/PaddlePaddle/Paddle/pull/60466), [#63690](https://github.com/PaddlePaddle/Paddle/pull/63690), [#63953](https://github.com/PaddlePaddle/Paddle/pull/63953), [#63901](https://github.com/PaddlePaddle/Paddle/pull/63901), [#62624](https://github.com/PaddlePaddle/Paddle/pull/62624), [#59857](https://github.com/PaddlePaddle/Paddle/pull/59857), [#60084](https://github.com/PaddlePaddle/Paddle/pull/60084), [#60766](https://github.com/PaddlePaddle/Paddle/pull/60766), [#62788](https://github.com/PaddlePaddle/Paddle/pull/62788), [#62937](https://github.com/PaddlePaddle/Paddle/pull/62937), [#63134](https://github.com/PaddlePaddle/Paddle/pull/63134), [#62966](https://github.com/PaddlePaddle/Paddle/pull/62966), [#63648](https://github.com/PaddlePaddle/Paddle/pull/63648), [#63881](https://github.com/PaddlePaddle/Paddle/pull/63881), [#64358](https://github.com/PaddlePaddle/Paddle/pull/64358), [#60503](https://github.com/PaddlePaddle/Paddle/pull/60503), [#63604](https://github.com/PaddlePaddle/Paddle/pull/63604), [#62338](https://github.com/PaddlePaddle/Paddle/pull/62338) -- Enhance single-test infrastructure for higher-order differentiation, making it easier to add single-test use cases for higher-order differentiation. [#62074](https://github.com/PaddlePaddle/Paddle/pull/62074) +### Bug Fixes -### API Performance Improvements +- Fixed a bug in `paddle.nn.functional.max_unpool1d` when the input `output_size` is a tuple. [#65910](https://github.com/PaddlePaddle/Paddle/pull/65910) +- Fixed the issue where `paddle.base.core.eager.Tensor` did not support paddle::DataType. [#66765](https://github.com/PaddlePaddle/Paddle/pull/66765) +- Fixed the issue where an error occurred during BF16 training when the pir switch was turned on. [#66833](https://github.com/PaddlePaddle/Paddle/pull/66833) +- Fixed the issue of bias in the linear layer during parallel processing in the pipeline. [#67212](https://github.com/PaddlePaddle/Paddle/pull/67212) +- Fixed the error issue when using loss for judgment in parallel pipeline. [#66980](https://github.com/PaddlePaddle/Paddle/pull/66980) +- Fixed the error issue when using `paddle.Tensor.item` in parallel pipeline. [#67441](https://github.com/PaddlePaddle/Paddle/pull/67441) +- Fixed bugs in `paddle.einsum` in specific scenarios. [#67588](https://github.com/PaddlePaddle/Paddle/pull/67588) +- Fixed the error issue of `paddle.nn.SyncBatchNorm` during gradient computation. [#67559](https://github.com/PaddlePaddle/Paddle/pull/67559) +- Fixed the issue mentioned in [issue #69992](https://github.com/PaddlePaddle/Paddle/issues/69992). [#70017](https://github.com/PaddlePaddle/Paddle/pull/70017) +- Fixed the issue where `paddle.arange` produced incorrect results when dealing with large integers. [#70188](https://github.com/PaddlePaddle/Paddle/pull/70188) +- Fixed the issue where `paddle.max` and `paddle.min` propagated incorrectly when there were nan values in the input. [#70049](https://github.com/PaddlePaddle/Paddle/pull/70049) +- Fixed issues with APIs such as `paddle.linalg.svd` and `paddle.linalg.any` when handling 0-size Tensor. [#70235](https://github.com/PaddlePaddle/Paddle/pull/70235), [#70489](https://github.com/PaddlePaddle/Paddle/pull/70489), [#70047](https://github.com/PaddlePaddle/Paddle/pull/70047), [#70103](https://github.com/PaddlePaddle/Paddle/pull/70103), [#70127](https://github.com/PaddlePaddle/Paddle/pull/70127), [#70098](https://github.com/PaddlePaddle/Paddle/pull/70098), [#70077](https://github.com/PaddlePaddle/Paddle/pull/70077), [#70130](https://github.com/PaddlePaddle/Paddle/pull/70130), [#70254](https://github.com/PaddlePaddle/Paddle/pull/70254), [#70125](https://github.com/PaddlePaddle/Paddle/pull/70125), [#70342](https://github.com/PaddlePaddle/Paddle/pull/70342), [#70369](https://github.com/PaddlePaddle/Paddle/pull/70369), [#71094](https://github.com/PaddlePaddle/Paddle/pull/71094), [#71089](https://github.com/PaddlePaddle/Paddle/pull/71089), [#71185](https://github.com/PaddlePaddle/Paddle/pull/71185), [#70537](https://github.com/PaddlePaddle/Paddle/pull/70537), [#70481](https://github.com/PaddlePaddle/Paddle/pull/70481) +- Fixed some issues with type hint annotations and documentation issues. [#65429](https://github.com/PaddlePaddle/Paddle/pull/65429), [#65496](https://github.com/PaddlePaddle/Paddle/pull/65496), [#65461](https://github.com/PaddlePaddle/Paddle/pull/65461), [#65542](https://github.com/PaddlePaddle/Paddle/pull/65542), [#65575](https://github.com/PaddlePaddle/Paddle/pull/65575), [#65545](https://github.com/PaddlePaddle/Paddle/pull/65545), [#65609](https://github.com/PaddlePaddle/Paddle/pull/65609), [#65644](https://github.com/PaddlePaddle/Paddle/pull/65644), [#65700](https://github.com/PaddlePaddle/Paddle/pull/65700), [#65697](https://github.com/PaddlePaddle/Paddle/pull/65697), [#65719](https://github.com/PaddlePaddle/Paddle/pull/65719), [#65639](https://github.com/PaddlePaddle/Paddle/pull/65639), [#65742](https://github.com/PaddlePaddle/Paddle/pull/65742), [#65891](https://github.com/PaddlePaddle/Paddle/pull/65891), [#65877](https://github.com/PaddlePaddle/Paddle/pull/65877), [#65895](https://github.com/PaddlePaddle/Paddle/pull/65895), [#66007](https://github.com/PaddlePaddle/Paddle/pull/66007), [#66679](https://github.com/PaddlePaddle/Paddle/pull/66679), [#66680](https://github.com/PaddlePaddle/Paddle/pull/66680), [#66676](https://github.com/PaddlePaddle/Paddle/pull/66676), [#66677](https://github.com/PaddlePaddle/Paddle/pull/66677), [#66884](https://github.com/PaddlePaddle/Paddle/pull/66884), [#67288](https://github.com/PaddlePaddle/Paddle/pull/67288), [#67302](https://github.com/PaddlePaddle/Paddle/pull/67302), [#66978](https://github.com/PaddlePaddle/Paddle/pull/66978), [#67295](https://github.com/PaddlePaddle/Paddle/pull/67295), [#67520](https://github.com/PaddlePaddle/Paddle/pull/67520), [#67421](https://github.com/PaddlePaddle/Paddle/pull/67421), [#67529](https://github.com/PaddlePaddle/Paddle/pull/67529), [#67536](https://github.com/PaddlePaddle/Paddle/pull/67536), [#67618](https://github.com/PaddlePaddle/Paddle/pull/67618), [#67661](https://github.com/PaddlePaddle/Paddle/pull/67661), [#67698](https://github.com/PaddlePaddle/Paddle/pull/67698), [#67800](https://github.com/PaddlePaddle/Paddle/pull/67800), [#67933](https://github.com/PaddlePaddle/Paddle/pull/67933), [#67893](https://github.com/PaddlePaddle/Paddle/pull/67893), [#68108](https://github.com/PaddlePaddle/Paddle/pull/68108), [#67927](https://github.com/PaddlePaddle/Paddle/pull/67927), [#68322](https://github.com/PaddlePaddle/Paddle/pull/68322), [#68341](https://github.com/PaddlePaddle/Paddle/pull/68341), [#68415](https://github.com/PaddlePaddle/Paddle/pull/68415), [#68372](https://github.com/PaddlePaddle/Paddle/pull/68372), [#68559](https://github.com/PaddlePaddle/Paddle/pull/68559), [#68598](https://github.com/PaddlePaddle/Paddle/pull/68598), [#68708](https://github.com/PaddlePaddle/Paddle/pull/68708), [#68780](https://github.com/PaddlePaddle/Paddle/pull/68780), [#68992](https://github.com/PaddlePaddle/Paddle/pull/68992), [#68989](https://github.com/PaddlePaddle/Paddle/pull/68989), [#68895](https://github.com/PaddlePaddle/Paddle/pull/68895), [#69014](https://github.com/PaddlePaddle/Paddle/pull/69014), [#69139](https://github.com/PaddlePaddle/Paddle/pull/69139), [#68996](https://github.com/PaddlePaddle/Paddle/pull/68996), [#69090](https://github.com/PaddlePaddle/Paddle/pull/69090), [#68922](https://github.com/PaddlePaddle/Paddle/pull/68922), [#69333](https://github.com/PaddlePaddle/Paddle/pull/69333), [#69141](https://github.com/PaddlePaddle/Paddle/pull/69141), [#69609](https://github.com/PaddlePaddle/Paddle/pull/69609), [#69652](https://github.com/PaddlePaddle/Paddle/pull/69652), [#69715](https://github.com/PaddlePaddle/Paddle/pull/69715), [#69716](https://github.com/PaddlePaddle/Paddle/pull/69716), [#69934](https://github.com/PaddlePaddle/Paddle/pull/69934), [#70253](https://github.com/PaddlePaddle/Paddle/pull/70253), [#70297](https://github.com/PaddlePaddle/Paddle/pull/70297), [#70252](https://github.com/PaddlePaddle/Paddle/pull/70252), [#70468](https://github.com/PaddlePaddle/Paddle/pull/70468), [#70102](https://github.com/PaddlePaddle/Paddle/pull/70102), [#70546](https://github.com/PaddlePaddle/Paddle/pull/70546), [#70616](https://github.com/PaddlePaddle/Paddle/pull/70616), [#70582](https://github.com/PaddlePaddle/Paddle/pull/70582), [#70635](https://github.com/PaddlePaddle/Paddle/pull/70635), [#70499](https://github.com/PaddlePaddle/Paddle/pull/70499), [#70755](https://github.com/PaddlePaddle/Paddle/pull/70755), [#70935](https://github.com/PaddlePaddle/Paddle/pull/70935), [#71133](https://github.com/PaddlePaddle/Paddle/pull/71133), [#71172](https://github.com/PaddlePaddle/Paddle/pull/71172), [#71238](https://github.com/PaddlePaddle/Paddle/pull/71238), [#71230](https://github.com/PaddlePaddle/Paddle/pull/71230), [#71394](https://github.com/PaddlePaddle/Paddle/pull/71394) -- Focus on optimizing the performance of Tensor basic index, advanced index, and combined index, improving computational performance by 2X to 31X on GPUs and 1.8X to 1004X on CPUs. [#60254](https://github.com/PaddlePaddle/Paddle/pull/60254), [#60276](https://github.com/PaddlePaddle/Paddle/pull/60276), [#60452](https://github.com/PaddlePaddle/Paddle/pull/60452), [#60771](https://github.com/PaddlePaddle/Paddle/pull/60771), [#61021](https://github.com/PaddlePaddle/Paddle/pull/61021), [#60983](https://github.com/PaddlePaddle/Paddle/pull/60983), [#61060](https://github.com/PaddlePaddle/Paddle/pull/61060), [#60618](https://github.com/PaddlePaddle/Paddle/pull/60618) +### Document optimization -### Bug Fixing +- Enhanced several API documents to make them easier to read and understand. [#67772](https://github.com/PaddlePaddle/Paddle/pull/67772), [#69895](https://github.com/PaddlePaddle/Paddle/pull/69895), [#65904](https://github.com/PaddlePaddle/Paddle/pull/65904), [#66480](https://github.com/PaddlePaddle/Paddle/pull/66480), [#66974](https://github.com/PaddlePaddle/Paddle/pull/66974), [#67100](https://github.com/PaddlePaddle/Paddle/pull/67100), [#66991](https://github.com/PaddlePaddle/Paddle/pull/66991), [#67287](https://github.com/PaddlePaddle/Paddle/pull/67287), [#67841](https://github.com/PaddlePaddle/Paddle/pull/67841), [#68206](https://github.com/PaddlePaddle/Paddle/pull/68206), [#68305](https://github.com/PaddlePaddle/Paddle/pull/68305), [#68462](https://github.com/PaddlePaddle/Paddle/pull/68462), [#67061](https://github.com/PaddlePaddle/Paddle/pull/67061), [#66503](https://github.com/PaddlePaddle/Paddle/pull/66503), [#68856](https://github.com/PaddlePaddle/Paddle/pull/68856), [#68866](https://github.com/PaddlePaddle/Paddle/pull/68866), [#68768](https://github.com/PaddlePaddle/Paddle/pull/68768), [#69215](https://github.com/PaddlePaddle/Paddle/pull/69215), [#69449](https://github.com/PaddlePaddle/Paddle/pull/69449), [#69396](https://github.com/PaddlePaddle/Paddle/pull/69396), [#69498](https://github.com/PaddlePaddle/Paddle/pull/69498), [#69413](https://github.com/PaddlePaddle/Paddle/pull/69413), [#69404](https://github.com/PaddlePaddle/Paddle/pull/69404), [#69729](https://github.com/PaddlePaddle/Paddle/pull/69729), [#69749](https://github.com/PaddlePaddle/Paddle/pull/69749), [#69266](https://github.com/PaddlePaddle/Paddle/pull/69266), [#69989](https://github.com/PaddlePaddle/Paddle/pull/69989), [#70209](https://github.com/PaddlePaddle/Paddle/pull/70209), [#70128](https://github.com/PaddlePaddle/Paddle/pull/70128), [#70143](https://github.com/PaddlePaddle/Paddle/pull/70143), [#69874](https://github.com/PaddlePaddle/Paddle/pull/69874), [#70242](https://github.com/PaddlePaddle/Paddle/pull/70242), [#70145](https://github.com/PaddlePaddle/Paddle/pull/70145), [#70813](https://github.com/PaddlePaddle/Paddle/pull/70813), [#71046](https://github.com/PaddlePaddle/Paddle/pull/71046) -- Fix errors in `paddle.optimizer.LBFGS` caused by using non-Tensor computations [#60219](https://github.com/PaddlePaddle/Paddle/pull/60219) -- Fix the problem of random numbers not being fixed in `paddle.optimizer.LBFGS` [#60591](https://github.com/PaddlePaddle/Paddle/pull/60591) -- Fix the incorrect calculation of gradient of `set_value` operator [#59034](https://github.com/PaddlePaddle/Paddle/pull/59034) -- Fix the problem of Tensor basic index adapting to PIR [#60259](https://github.com/PaddlePaddle/Paddle/pull/60259), [#61103](https://github.com/PaddlePaddle/Paddle/pull/61103) -- Fix the problem of Tensor combined index assignment [problem](https://github.com/PaddlePaddle/Paddle/issues/60376) [#60447](https://github.com/PaddlePaddle/Paddle/pull/60447) -- Fix the problem when Tensor combined index takes values [problem] [#61922](https://github.com/PaddlePaddle/Paddle/pull/61922) -- Fix `paddle.flatten` stride calculation error issue, with being able to add `paddle.flatten_` [#63084](https://github.com/PaddlePaddle/Paddle/pull/63084) -- Fix the result inconsistency problem between `paddle.index_fill` and `paddle.index_fill_` [#59863](https://github.com/PaddlePaddle/Paddle/pull/59863) -- Fix the `paddle.masked_scatter` error report issue [#60835](https://github.com/PaddlePaddle/Paddle/pull/60835) -- Fix the `paddle.histogramdd` cpu error report issue [#61891](https://github.com/PaddlePaddle/Paddle/pull/61891) -- Fix the bug that `paddle.cast_` continuous use on cpu leads to incorrect result [#60054](https://github.com/PaddlePaddle/Paddle/pull/60054) -- Fix `paddle.put_along_axis` bug when input size is very large [#60551](https://github.com/PaddlePaddle/Paddle/pull/60551) -- Fix `paddle.nanmedian` cpu error report issue [#63221](https://github.com/PaddlePaddle/Paddle/pull/63221) -- Fix the bug that `paddle.median` does not support inputs other than floating-point types in the min branch. [#64444](https://github.com/PaddlePaddle/Paddle/pull/64444) -- Fix the dataloader issue in distributed scenarios. [#62696](https://github.com/PaddlePaddle/Paddle/pull/62696), [#63378](https://github.com/PaddlePaddle/Paddle/pull/63378) -- Fix the formatting issue in error prompt [#63106](https://github.com/PaddlePaddle/Paddle/pull/63106), [#63144](https://github.com/PaddlePaddle/Paddle/pull/63144) -- Fix the format issue under GLOG_v>=6. [#63345](https://github.com/PaddlePaddle/Paddle/pull/63345) +## 2. Basic execution architecture -### Security Improvements +PIR is fully implemented and enabled by default, supporting one-click transition from motion to stillness, ensuring excellent performance and good scalability of the framework. -- Enhance the checking of parent_ids [#62826](https://github.com/PaddlePaddle/Paddle/pull/62826) +### Bug Fixes -## Basic Execution Architecture +- Fixed accuracy issues caused by parameter configuration. [#65814](https://github.com/PaddlePaddle/Paddle/pull/65814) +- Fixed bugs related to save/load. [#65268](https://github.com/PaddlePaddle/Paddle/pull/65268), [#65359](https://github.com/PaddlePaddle/Paddle/pull/65359), [#65373](https://github.com/PaddlePaddle/Paddle/pull/65373), [#65314](https://github.com/PaddlePaddle/Paddle/pull/65314), [#65446](https://github.com/PaddlePaddle/Paddle/pull/65446), [#65476](https://github.com/PaddlePaddle/Paddle/pull/65476), [#66891](https://github.com/PaddlePaddle/Paddle/pull/66891), [#66931](https://github.com/PaddlePaddle/Paddle/pull/66931), [#65978](https://github.com/PaddlePaddle/Paddle/pull/65978), [#67654](https://github.com/PaddlePaddle/Paddle/pull/67654), [#67906](https://github.com/PaddlePaddle/Paddle/pull/67906), [#68723](https://github.com/PaddlePaddle/Paddle/pull/68723), [#71452](https://github.com/PaddlePaddle/Paddle/pull/71452), [#71457](https://github.com/PaddlePaddle/Paddle/pull/71457), [#67819](https://github.com/PaddlePaddle/Paddle/pull/67819), [#68120](https://github.com/PaddlePaddle/Paddle/pull/68120), [#68300](https://github.com/PaddlePaddle/Paddle/pull/68300), [#68315](https://github.com/PaddlePaddle/Paddle/pull/68315), [#68743](https://github.com/PaddlePaddle/Paddle/pull/68743), [#68744](https://github.com/PaddlePaddle/Paddle/pull/68744), [#69585](https://github.com/PaddlePaddle/Paddle/pull/69585), [#71165](https://github.com/PaddlePaddle/Paddle/pull/71165), [#71400](https://github.com/PaddlePaddle/Paddle/pull/71400) +- Skip/fix failed unit tests in PIR mode, including scenarios such as Windows and XPU. [#65690](https://github.com/PaddlePaddle/Paddle/pull/65690), [#65759](https://github.com/PaddlePaddle/Paddle/pull/65759), [#65730](https://github.com/PaddlePaddle/Paddle/pull/65730), [#65760](https://github.com/PaddlePaddle/Paddle/pull/65760), [#65833](https://github.com/PaddlePaddle/Paddle/pull/65833), [#65834](https://github.com/PaddlePaddle/Paddle/pull/65834), [#65856](https://github.com/PaddlePaddle/Paddle/pull/65856), [#65886](https://github.com/PaddlePaddle/Paddle/pull/65886), [#65899](https://github.com/PaddlePaddle/Paddle/pull/65899), [#65932](https://github.com/PaddlePaddle/Paddle/pull/65932), [#65998](https://github.com/PaddlePaddle/Paddle/pull/65998), [#65953](https://github.com/PaddlePaddle/Paddle/pull/65953), [#65997](https://github.com/PaddlePaddle/Paddle/pull/65997), [#66061](https://github.com/PaddlePaddle/Paddle/pull/66061), [#66111](https://github.com/PaddlePaddle/Paddle/pull/66111), [#66137](https://github.com/PaddlePaddle/Paddle/pull/66137), [#66073](https://github.com/PaddlePaddle/Paddle/pull/66073), [#66203](https://github.com/PaddlePaddle/Paddle/pull/66203), [#66227](https://github.com/PaddlePaddle/Paddle/pull/66227), [#65744](https://github.com/PaddlePaddle/Paddle/pull/65744), [#66234](https://github.com/PaddlePaddle/Paddle/pull/66234), [#67487](https://github.com/PaddlePaddle/Paddle/pull/67487), [#67561](https://github.com/PaddlePaddle/Paddle/pull/67561), [#67584](https://github.com/PaddlePaddle/Paddle/pull/67584), [#67742](https://github.com/PaddlePaddle/Paddle/pull/67742), [#69832](https://github.com/PaddlePaddle/Paddle/pull/69832), [#65885](https://github.com/PaddlePaddle/Paddle/pull/65885), [#66709](https://github.com/PaddlePaddle/Paddle/pull/66709), [#66734](https://github.com/PaddlePaddle/Paddle/pull/66734), [#66959](https://github.com/PaddlePaddle/Paddle/pull/66959), [#67399](https://github.com/PaddlePaddle/Paddle/pull/67399), [#67389](https://github.com/PaddlePaddle/Paddle/pull/67389), [#67230](https://github.com/PaddlePaddle/Paddle/pull/67230), [#67403](https://github.com/PaddlePaddle/Paddle/pull/67403), [#67619](https://github.com/PaddlePaddle/Paddle/pull/67619), [#67662](https://github.com/PaddlePaddle/Paddle/pull/67662), [#67902](https://github.com/PaddlePaddle/Paddle/pull/67902), [#67382](https://github.com/PaddlePaddle/Paddle/pull/67382), [#67430](https://github.com/PaddlePaddle/Paddle/pull/67430), [#67517](https://github.com/PaddlePaddle/Paddle/pull/67517), [#67533](https://github.com/PaddlePaddle/Paddle/pull/67533), [#67573](https://github.com/PaddlePaddle/Paddle/pull/67573), [#67468](https://github.com/PaddlePaddle/Paddle/pull/67468), [#67640](https://github.com/PaddlePaddle/Paddle/pull/67640), [#67667](https://github.com/PaddlePaddle/Paddle/pull/67667), [#67716](https://github.com/PaddlePaddle/Paddle/pull/67716), [#68386](https://github.com/PaddlePaddle/Paddle/pull/68386), [#67234](https://github.com/PaddlePaddle/Paddle/pull/67234), [#67266](https://github.com/PaddlePaddle/Paddle/pull/67266), [#67362](https://github.com/PaddlePaddle/Paddle/pull/67362), [#67631](https://github.com/PaddlePaddle/Paddle/pull/67631), [#68081](https://github.com/PaddlePaddle/Paddle/pull/68081) +- Fixed bugs related to dynamic graphs. [#65619](https://github.com/PaddlePaddle/Paddle/pull/65619), [#69163](https://github.com/PaddlePaddle/Paddle/pull/69163), [#68862](https://github.com/PaddlePaddle/Paddle/pull/68862), [#68164](https://github.com/PaddlePaddle/Paddle/pull/68164), [#69867](https://github.com/PaddlePaddle/Paddle/pull/69867) +- Fixed bugs related to control flow. [#65722](https://github.com/PaddlePaddle/Paddle/pull/65722), [#70181](https://github.com/PaddlePaddle/Paddle/pull/70181) +- Fixed kernel operation-related bugs, including issues with operation positions and null pointers. [#66334](https://github.com/PaddlePaddle/Paddle/pull/66334), [#67931](https://github.com/PaddlePaddle/Paddle/pull/67931), [#70353](https://github.com/PaddlePaddle/Paddle/pull/70353) +- Fixed Amp-related bugs. [#66778](https://github.com/PaddlePaddle/Paddle/pull/66778), [#67582](https://github.com/PaddlePaddle/Paddle/pull/67582), [#67704](https://github.com/PaddlePaddle/Paddle/pull/67704), [#68655](https://github.com/PaddlePaddle/Paddle/pull/68655) +- Fixed CINN-related bugs. [#69577](https://github.com/PaddlePaddle/Paddle/pull/69577), [#71101](https://github.com/PaddlePaddle/Paddle/pull/71101), [#71387](https://github.com/PaddlePaddle/Paddle/pull/71387), [#71401](https://github.com/PaddlePaddle/Paddle/pull/71401) +- Fixed the bug related to the transition from dynamic to static. [#67617](https://github.com/PaddlePaddle/Paddle/pull/67617), [#67936](https://github.com/PaddlePaddle/Paddle/pull/67936), [#68938](https://github.com/PaddlePaddle/Paddle/pull/68938), [#68734](https://github.com/PaddlePaddle/Paddle/pull/68734), [#69010](https://github.com/PaddlePaddle/Paddle/pull/69010), [#69408](https://github.com/PaddlePaddle/Paddle/pull/69408), [#69461](https://github.com/PaddlePaddle/Paddle/pull/69461), [#69699](https://github.com/PaddlePaddle/Paddle/pull/69699), [#69774](https://github.com/PaddlePaddle/Paddle/pull/69774), [#69803](https://github.com/PaddlePaddle/Paddle/pull/69803), [#69853](https://github.com/PaddlePaddle/Paddle/pull/69853), [#70510](https://github.com/PaddlePaddle/Paddle/pull/70510), [#70830](https://github.com/PaddlePaddle/Paddle/pull/70830), [#70904](https://github.com/PaddlePaddle/Paddle/pull/70904), [#70913](https://github.com/PaddlePaddle/Paddle/pull/70913), [#71040](https://github.com/PaddlePaddle/Paddle/pull/71040), [#71048](https://github.com/PaddlePaddle/Paddle/pull/71048), [#71106](https://github.com/PaddlePaddle/Paddle/pull/71106), [#71201](https://github.com/PaddlePaddle/Paddle/pull/71201), [#71216](https://github.com/PaddlePaddle/Paddle/pull/71216), [#71223](https://github.com/PaddlePaddle/Paddle/pull/71223), [#71296](https://github.com/PaddlePaddle/Paddle/pull/71296), [#71385](https://github.com/PaddlePaddle/Paddle/pull/71385), [#71505](https://github.com/PaddlePaddle/Paddle/pull/71505), [#66934](https://github.com/PaddlePaddle/Paddle/pull/66934), [#71096](https://github.com/PaddlePaddle/Paddle/pull/71096), [#71144](https://github.com/PaddlePaddle/Paddle/pull/71144), [#71430](https://github.com/PaddlePaddle/Paddle/pull/71430), [#71437](https://github.com/PaddlePaddle/Paddle/pull/71437), [#71473](https://github.com/PaddlePaddle/Paddle/pull/71473), [#71412](https://github.com/PaddlePaddle/Paddle/pull/71412), [#65648](https://github.com/PaddlePaddle/Paddle/pull/65648), [#67853](https://github.com/PaddlePaddle/Paddle/pull/67853), [#66543](https://github.com/PaddlePaddle/Paddle/pull/66543), [#68229](https://github.com/PaddlePaddle/Paddle/pull/68229), [#70846](https://github.com/PaddlePaddle/Paddle/pull/70846), [#67532](https://github.com/PaddlePaddle/Paddle/pull/67532) +- Fixed other bugs, including issues related to backpropagation gradient calculation, memory copying, and executor errors. [#65493](https://github.com/PaddlePaddle/Paddle/pull/65493), [#65678](https://github.com/PaddlePaddle/Paddle/pull/65678), [#65673](https://github.com/PaddlePaddle/Paddle/pull/65673), [#65794](https://github.com/PaddlePaddle/Paddle/pull/65794), [#66358](https://github.com/PaddlePaddle/Paddle/pull/66358), [#66875](https://github.com/PaddlePaddle/Paddle/pull/66875), [#67339](https://github.com/PaddlePaddle/Paddle/pull/67339), [#67465](https://github.com/PaddlePaddle/Paddle/pull/67465), [#67754](https://github.com/PaddlePaddle/Paddle/pull/67754), [#67835](https://github.com/PaddlePaddle/Paddle/pull/67835), [#67892](https://github.com/PaddlePaddle/Paddle/pull/67892), [#67967](https://github.com/PaddlePaddle/Paddle/pull/67967), [#67952](https://github.com/PaddlePaddle/Paddle/pull/67952), [#68036](https://github.com/PaddlePaddle/Paddle/pull/68036), [#68063](https://github.com/PaddlePaddle/Paddle/pull/68063), [#68128](https://github.com/PaddlePaddle/Paddle/pull/68128), [#68151](https://github.com/PaddlePaddle/Paddle/pull/68151), [#68140](https://github.com/PaddlePaddle/Paddle/pull/68140), [#68167](https://github.com/PaddlePaddle/Paddle/pull/68167), [#68200](https://github.com/PaddlePaddle/Paddle/pull/68200), [#68325](https://github.com/PaddlePaddle/Paddle/pull/68325), [#68376](https://github.com/PaddlePaddle/Paddle/pull/68376), [#68539](https://github.com/PaddlePaddle/Paddle/pull/68539), [#68530](https://github.com/PaddlePaddle/Paddle/pull/68530), [#68637](https://github.com/PaddlePaddle/Paddle/pull/68637), [#68639](https://github.com/PaddlePaddle/Paddle/pull/68639), [#68688](https://github.com/PaddlePaddle/Paddle/pull/68688), [#68751](https://github.com/PaddlePaddle/Paddle/pull/68751), [#68806](https://github.com/PaddlePaddle/Paddle/pull/68806), [#68810](https://github.com/PaddlePaddle/Paddle/pull/68810), [#68779](https://github.com/PaddlePaddle/Paddle/pull/68779), [#68811](https://github.com/PaddlePaddle/Paddle/pull/68811), [#68844](https://github.com/PaddlePaddle/Paddle/pull/68844), [#68790](https://github.com/PaddlePaddle/Paddle/pull/68790), [#68870](https://github.com/PaddlePaddle/Paddle/pull/68870), [#68960](https://github.com/PaddlePaddle/Paddle/pull/68960), [#68999](https://github.com/PaddlePaddle/Paddle/pull/68999), [#69036](https://github.com/PaddlePaddle/Paddle/pull/69036), [#69188](https://github.com/PaddlePaddle/Paddle/pull/69188), [#69234](https://github.com/PaddlePaddle/Paddle/pull/69234), [#69375](https://github.com/PaddlePaddle/Paddle/pull/69375), [#69399](https://github.com/PaddlePaddle/Paddle/pull/69399), [#69538](https://github.com/PaddlePaddle/Paddle/pull/69538), [#69603](https://github.com/PaddlePaddle/Paddle/pull/69603), [#69633](https://github.com/PaddlePaddle/Paddle/pull/69633), [#69765](https://github.com/PaddlePaddle/Paddle/pull/69765), [#69768](https://github.com/PaddlePaddle/Paddle/pull/69768), [#69821](https://github.com/PaddlePaddle/Paddle/pull/69821), [#70091](https://github.com/PaddlePaddle/Paddle/pull/70091), [#70123](https://github.com/PaddlePaddle/Paddle/pull/70123), [#70147](https://github.com/PaddlePaddle/Paddle/pull/70147), [#70201](https://github.com/PaddlePaddle/Paddle/pull/70201), [#70198](https://github.com/PaddlePaddle/Paddle/pull/70198), [#69815](https://github.com/PaddlePaddle/Paddle/pull/69815), [#70420](https://github.com/PaddlePaddle/Paddle/pull/70420), [#70377](https://github.com/PaddlePaddle/Paddle/pull/70377), [#70552](https://github.com/PaddlePaddle/Paddle/pull/70552), [#70545](https://github.com/PaddlePaddle/Paddle/pull/70545), [#70595](https://github.com/PaddlePaddle/Paddle/pull/70595), [#70836](https://github.com/PaddlePaddle/Paddle/pull/70836), [#70771](https://github.com/PaddlePaddle/Paddle/pull/70771), [#70922](https://github.com/PaddlePaddle/Paddle/pull/70922), [#70969](https://github.com/PaddlePaddle/Paddle/pull/70969), [#70926](https://github.com/PaddlePaddle/Paddle/pull/70926), [#71117](https://github.com/PaddlePaddle/Paddle/pull/71117), [#71151](https://github.com/PaddlePaddle/Paddle/pull/71151), [#71194](https://github.com/PaddlePaddle/Paddle/pull/71194), [#71234](https://github.com/PaddlePaddle/Paddle/pull/71234), [#71339](https://github.com/PaddlePaddle/Paddle/pull/71339), [#71445](https://github.com/PaddlePaddle/Paddle/pull/71445), [#66350](https://github.com/PaddlePaddle/Paddle/pull/66350), [#66533](https://github.com/PaddlePaddle/Paddle/pull/66533), [#66622](https://github.com/PaddlePaddle/Paddle/pull/66622), [#67721](https://github.com/PaddlePaddle/Paddle/pull/67721), [#67700](https://github.com/PaddlePaddle/Paddle/pull/67700), [#69207](https://github.com/PaddlePaddle/Paddle/pull/69207), [#69615](https://github.com/PaddlePaddle/Paddle/pull/69615), [#69785](https://github.com/PaddlePaddle/Paddle/pull/69785), [#67805](https://github.com/PaddlePaddle/Paddle/pull/67805) -PIR basic functions have been upgraded and improved comprehensively, and the maturity level has been greatly improved. Based on PIR, the design of the PaddlePaddle infrastructure is more reasonable, ensuring the excellent performance and good scalability of the framework. In this version, we have completed the inference verification of PIR in multiple scenarios: For the single-machine scenario, complete the PIR back-end switching in the dynamic-to-static scenarios; For inference scenario, complete the verification of all the stock models, and 84.2% of the models have a gain of 10%+; we have completed the verification of distributed scenarios based on PIR. Meanwhile, based on PIR, we have completed the development and validation of core modules such as control flow, backward logic, save/load, and OneDNN adaptation, which lays a solid foundation for the switching of the PaddlePaddle PIR to the default mode. The functional completeness, execution efficiency and stability of the PaddlePaddle framework operator system are further improved, bringing better use and development experience to the developers. +### Function optimization -### Function Optimization +- Support save/load. [#65296](https://github.com/PaddlePaddle/Paddle/pull/65296), [#65671](https://github.com/PaddlePaddle/Paddle/pull/65671), [#66231](https://github.com/PaddlePaddle/Paddle/pull/66231), [#66185](https://github.com/PaddlePaddle/Paddle/pull/66185), [#66722](https://github.com/PaddlePaddle/Paddle/pull/66722), [#66863](https://github.com/PaddlePaddle/Paddle/pull/66863), [#67057](https://github.com/PaddlePaddle/Paddle/pull/67057), [#68101](https://github.com/PaddlePaddle/Paddle/pull/68101), [#68628](https://github.com/PaddlePaddle/Paddle/pull/68628), [#66359](https://github.com/PaddlePaddle/Paddle/pull/66359), [#68481](https://github.com/PaddlePaddle/Paddle/pull/68481) +- Optimize the compilation process of custom operators. [#67615](https://github.com/PaddlePaddle/Paddle/pull/67615), [#67659](https://github.com/PaddlePaddle/Paddle/pull/67659) +- Support for composite operators. [#69121](https://github.com/PaddlePaddle/Paddle/pull/69121), [#69144](https://github.com/PaddlePaddle/Paddle/pull/69144), [#70204](https://github.com/PaddlePaddle/Paddle/pull/70204), [#71098](https://github.com/PaddlePaddle/Paddle/pull/71098), [#71335](https://github.com/PaddlePaddle/Paddle/pull/71335) +- Support for CINN compiler execution. [#69589](https://github.com/PaddlePaddle/Paddle/pull/69589), [#70115](https://github.com/PaddlePaddle/Paddle/pull/70115) +- Support for custom devices. [#70909](https://github.com/PaddlePaddle/Paddle/pull/70909), [#71294](https://github.com/PaddlePaddle/Paddle/pull/71294), [#71362](https://github.com/PaddlePaddle/Paddle/pull/71362), [#71010](https://github.com/PaddlePaddle/Paddle/pull/71010), [#71036](https://github.com/PaddlePaddle/Paddle/pull/71036), [#70637](https://github.com/PaddlePaddle/Paddle/pull/70637), [#71085](https://github.com/PaddlePaddle/Paddle/pull/71085) +- Execution support for other scenarios. [#65050](https://github.com/PaddlePaddle/Paddle/pull/65050), [#65664](https://github.com/PaddlePaddle/Paddle/pull/65664), [#65741](https://github.com/PaddlePaddle/Paddle/pull/65741), [#65786](https://github.com/PaddlePaddle/Paddle/pull/65786), [#65499](https://github.com/PaddlePaddle/Paddle/pull/65499), [#66441](https://github.com/PaddlePaddle/Paddle/pull/66441), [#67668](https://github.com/PaddlePaddle/Paddle/pull/67668), [#68199](https://github.com/PaddlePaddle/Paddle/pull/68199), [#69088](https://github.com/PaddlePaddle/Paddle/pull/69088), [#70199](https://github.com/PaddlePaddle/Paddle/pull/70199), [#70308](https://github.com/PaddlePaddle/Paddle/pull/70308), [#70709](https://github.com/PaddlePaddle/Paddle/pull/70709), [#70937](https://github.com/PaddlePaddle/Paddle/pull/70937), [#71066](https://github.com/PaddlePaddle/Paddle/pull/71066), [#71079](https://github.com/PaddlePaddle/Paddle/pull/71079), [#71121](https://github.com/PaddlePaddle/Paddle/pull/71121), [#71136](https://github.com/PaddlePaddle/Paddle/pull/71136), [#71205](https://github.com/PaddlePaddle/Paddle/pull/71205) -- Improve the basic functions of PIR, including basic type system enhancement, debugging, printing, Pass development, and AMP support, to enhance the development efficiency of PIR. [#60723](https://github.com/PaddlePaddle/Paddle/pull/60723), [#60677](https://github.com/PaddlePaddle/Paddle/pull/60677), [#60783](https://github.com/PaddlePaddle/Paddle/pull/60783), [#60798](https://github.com/PaddlePaddle/Paddle/pull/60798), [#61053](https://github.com/PaddlePaddle/Paddle/pull/61053), [#61366](https://github.com/PaddlePaddle/Paddle/pull/61366), [#61446](https://github.com/PaddlePaddle/Paddle/pull/61446), [#60024](https://github.com/PaddlePaddle/Paddle/pull/60024), [#59939](https://github.com/PaddlePaddle/Paddle/pull/59939), [#63376](https://github.com/PaddlePaddle/Paddle/pull/63376), [#61853](https://github.com/PaddlePaddle/Paddle/pull/61853), [#63914](https://github.com/PaddlePaddle/Paddle/pull/63914), [#60170](https://github.com/PaddlePaddle/Paddle/pull/60170), [#60678](https://github.com/PaddlePaddle/Paddle/pull/60678), [#64093](https://github.com/PaddlePaddle/Paddle/pull/64093), [#64065](https://github.com/PaddlePaddle/Paddle/pull/64065), [#62451](https://github.com/PaddlePaddle/Paddle/pull/62451), [#59784](https://github.com/PaddlePaddle/Paddle/pull/59784), [#60136](https://github.com/PaddlePaddle/Paddle/pull/60136), [#63336](https://github.com/PaddlePaddle/Paddle/pull/63336), [#62108](https://github.com/PaddlePaddle/Paddle/pull/62108), [#60860](https://github.com/PaddlePaddle/Paddle/pull/60860), [#60536](https://github.com/PaddlePaddle/Paddle/pull/60536), [#60590](https://github.com/PaddlePaddle/Paddle/pull/60590), [#60752](https://github.com/PaddlePaddle/Paddle/pull/60752), [#61435](https://github.com/PaddlePaddle/Paddle/pull/61435), [#62977](https://github.com/PaddlePaddle/Paddle/pull/62977), [#62139](https://github.com/PaddlePaddle/Paddle/pull/62139), [#60432](https://github.com/PaddlePaddle/Paddle/pull/60432), [#61452](https://github.com/PaddlePaddle/Paddle/pull/61452), [#61978](https://github.com/PaddlePaddle/Paddle/pull/61978), [#62262](https://github.com/PaddlePaddle/Paddle/pull/62262), [#62422](https://github.com/PaddlePaddle/Paddle/pull/62422), [#60359](https://github.com/PaddlePaddle/Paddle/pull/60359), [#62989](https://github.com/PaddlePaddle/Paddle/pull/62989), [#61297](https://github.com/PaddlePaddle/Paddle/pull/61297), [#61399](https://github.com/PaddlePaddle/Paddle/pull/61399), [#61871](https://github.com/PaddlePaddle/Paddle/pull/61871), [#61496](https://github.com/PaddlePaddle/Paddle/pull/61496), [#62413](https://github.com/PaddlePaddle/Paddle/pull/62413) -- Optimize the execution logic of the PaddlePaddle actuator, improve the Pass system, enhance the performance of training and inference, to better support distributed parallel logic operation. [#60182](https://github.com/PaddlePaddle/Paddle/pull/60182), [#60516](https://github.com/PaddlePaddle/Paddle/pull/60516), [#63573](https://github.com/PaddlePaddle/Paddle/pull/63573), [#60181](https://github.com/PaddlePaddle/Paddle/pull/60181), [#59792](https://github.com/PaddlePaddle/Paddle/pull/59792), [#62025](https://github.com/PaddlePaddle/Paddle/pull/62025), [#61160](https://github.com/PaddlePaddle/Paddle/pull/61160), [#61188](https://github.com/PaddlePaddle/Paddle/pull/61188), [#61277](https://github.com/PaddlePaddle/Paddle/pull/61277), [#61669](https://github.com/PaddlePaddle/Paddle/pull/61669), [#60823](https://github.com/PaddlePaddle/Paddle/pull/60823), [#61310](https://github.com/PaddlePaddle/Paddle/pull/61310), [#60892](https://github.com/PaddlePaddle/Paddle/pull/60892), [#60578](https://github.com/PaddlePaddle/Paddle/pull/60578), [#61657](https://github.com/PaddlePaddle/Paddle/pull/61657), [#62638](https://github.com/PaddlePaddle/Paddle/pull/62638), [#63960](https://github.com/PaddlePaddle/Paddle/pull/63960), [#64234](https://github.com/PaddlePaddle/Paddle/pull/64234) +### New Features -### PIR New Features +- SOT adapts to Python 3.13 bytecode, supporting static graph conversion (SOT mode) under Python 3.13. [#68071](https://github.com/PaddlePaddle/Paddle/pull/68071), [#69126](https://github.com/PaddlePaddle/Paddle/pull/69126), [#69131](https://github.com/PaddlePaddle/Paddle/pull/69131), [#69196](https://github.com/PaddlePaddle/Paddle/pull/69196), [#69232](https://github.com/PaddlePaddle/Paddle/pull/69232), [#69253](https://github.com/PaddlePaddle/Paddle/pull/69253), [#69267](https://github.com/PaddlePaddle/Paddle/pull/69267), [#69412](https://github.com/PaddlePaddle/Paddle/pull/69412), [#69431](https://github.com/PaddlePaddle/Paddle/pull/69431), [#69432](https://github.com/PaddlePaddle/Paddle/pull/69432), [#69436](https://github.com/PaddlePaddle/Paddle/pull/69436), [#69557](https://github.com/PaddlePaddle/Paddle/pull/69557), [#69567](https://github.com/PaddlePaddle/Paddle/pull/69567), [#69700](https://github.com/PaddlePaddle/Paddle/pull/69700), [#69707](https://github.com/PaddlePaddle/Paddle/pull/69707), [#69735](https://github.com/PaddlePaddle/Paddle/pull/69735), [#69738](https://github.com/PaddlePaddle/Paddle/pull/69738), [#69744](https://github.com/PaddlePaddle/Paddle/pull/69744), [#69753](https://github.com/PaddlePaddle/Paddle/pull/69753), [#69887](https://github.com/PaddlePaddle/Paddle/pull/69887), [#69920](https://github.com/PaddlePaddle/Paddle/pull/69920), [#69950](https://github.com/PaddlePaddle/Paddle/pull/69950), [#70319](https://github.com/PaddlePaddle/Paddle/pull/70319), [#70927](https://github.com/PaddlePaddle/Paddle/pull/70927) +- Support for custom devices. [#68061](https://github.com/PaddlePaddle/Paddle/pull/68061), [#68836](https://github.com/PaddlePaddle/Paddle/pull/68836), [#70366](https://github.com/PaddlePaddle/Paddle/pull/70366), [#70549](https://github.com/PaddlePaddle/Paddle/pull/70549) +- Adapted PIR forward execution. [#65335](https://github.com/PaddlePaddle/Paddle/pull/65335) +- Support save/load. [#67910](https://github.com/PaddlePaddle/Paddle/pull/67910) +- Adapted to pylayer. [#70335](https://github.com/PaddlePaddle/Paddle/pull/70335) +- Adapt lazy_init. [#67379](https://github.com/PaddlePaddle/Paddle/pull/67379), [#67467](https://github.com/PaddlePaddle/Paddle/pull/67467) +- Optimize the logic under PIR. [#67961](https://github.com/PaddlePaddle/Paddle/pull/67961) +- Support for other scenarios. [#68344](https://github.com/PaddlePaddle/Paddle/pull/68344), [#70071](https://github.com/PaddlePaddle/Paddle/pull/70071), [#70291](https://github.com/PaddlePaddle/Paddle/pull/70291), [#70752](https://github.com/PaddlePaddle/Paddle/pull/70752), [#70812](https://github.com/PaddlePaddle/Paddle/pull/70812), [#71033](https://github.com/PaddlePaddle/Paddle/pull/71033) -- Realize reverse logic based on PIR, generate reverse computation graph directly, and support higher-order differentiation at the same time. [#60174](https://github.com/PaddlePaddle/Paddle/pull/60174), [#60328](https://github.com/PaddlePaddle/Paddle/pull/60328), [#60818](https://github.com/PaddlePaddle/Paddle/pull/60818), [#61352](https://github.com/PaddlePaddle/Paddle/pull/61352), [#61661](https://github.com/PaddlePaddle/Paddle/pull/61661), [#61927](https://github.com/PaddlePaddle/Paddle/pull/61927), [#62772](https://github.com/PaddlePaddle/Paddle/pull/62772), [#60360](https://github.com/PaddlePaddle/Paddle/pull/60360), [#60866](https://github.com/PaddlePaddle/Paddle/pull/60866), [#60970](https://github.com/PaddlePaddle/Paddle/pull/60970), [#60810](https://github.com/PaddlePaddle/Paddle/pull/60810), [#64696](https://github.com/PaddlePaddle/Paddle/pull/64696), [#59844](https://github.com/PaddlePaddle/Paddle/pull/59844), [#59999](https://github.com/PaddlePaddle/Paddle/pull/59999), [#60262](https://github.com/PaddlePaddle/Paddle/pull/60262), [#60338](https://github.com/PaddlePaddle/Paddle/pull/60338), [#59935](https://github.com/PaddlePaddle/Paddle/pull/59935), [#59982](https://github.com/PaddlePaddle/Paddle/pull/59982), [#60221](https://github.com/PaddlePaddle/Paddle/pull/60221), [#62621](https://github.com/PaddlePaddle/Paddle/pull/62621), [#60044](https://github.com/PaddlePaddle/Paddle/pull/60044), [#59790](https://github.com/PaddlePaddle/Paddle/pull/59790), [#60529](https://github.com/PaddlePaddle/Paddle/pull/60529), [#61378](https://github.com/PaddlePaddle/Paddle/pull/61378), [#61584](https://github.com/PaddlePaddle/Paddle/pull/61584) -- Implement control flow logic based on PIR to improve the expressive ability of PIR and better support multi-scenario services such as training and inference. [#61396](https://github.com/PaddlePaddle/Paddle/pull/61396), [#64045](https://github.com/PaddlePaddle/Paddle/pull/64045), [#60953](https://github.com/PaddlePaddle/Paddle/pull/60953), [#61091](https://github.com/PaddlePaddle/Paddle/pull/61091), [#61304](https://github.com/PaddlePaddle/Paddle/pull/61304), [#62093](https://github.com/PaddlePaddle/Paddle/pull/62093), [#64710](https://github.com/PaddlePaddle/Paddle/pull/64710), [#60668](https://github.com/PaddlePaddle/Paddle/pull/60668), [#60433](https://github.com/PaddlePaddle/Paddle/pull/60433), [#60963](https://github.com/PaddlePaddle/Paddle/pull/60963), [#61192](https://github.com/PaddlePaddle/Paddle/pull/61192), [#60895](https://github.com/PaddlePaddle/Paddle/pull/60895), [#60017](https://github.com/PaddlePaddle/Paddle/pull/60017), [#60369](https://github.com/PaddlePaddle/Paddle/pull/60369), [#60330](https://github.com/PaddlePaddle/Paddle/pull/60330), [#60364](https://github.com/PaddlePaddle/Paddle/pull/60364), [#61416](https://github.com/PaddlePaddle/Paddle/pull/61416), [#60460](https://github.com/PaddlePaddle/Paddle/pull/60460), [#60703](https://github.com/PaddlePaddle/Paddle/pull/60703), [#61027](https://github.com/PaddlePaddle/Paddle/pull/61027) -- Realize save/load logic based on PIR, to carry out the process of PIR and upstream/downstream training and inference services. [#63438](https://github.com/PaddlePaddle/Paddle/pull/63438), [#63574](https://github.com/PaddlePaddle/Paddle/pull/63574), [#64281](https://github.com/PaddlePaddle/Paddle/pull/64281), [#64327](https://github.com/PaddlePaddle/Paddle/pull/64327), [#63622](https://github.com/PaddlePaddle/Paddle/pull/63622), [#64507](https://github.com/PaddlePaddle/Paddle/pull/64507), [#63389](https://github.com/PaddlePaddle/Paddle/pull/63389), [#63539](https://github.com/PaddlePaddle/Paddle/pull/63539), [#63749](https://github.com/PaddlePaddle/Paddle/pull/63749), [#63957](https://github.com/PaddlePaddle/Paddle/pull/63957), [#64044](https://github.com/PaddlePaddle/Paddle/pull/64044), [#64121](https://github.com/PaddlePaddle/Paddle/pull/64121), [#64239](https://github.com/PaddlePaddle/Paddle/pull/64239), [#63818](https://github.com/PaddlePaddle/Paddle/pull/63818), [#63910](https://github.com/PaddlePaddle/Paddle/pull/63910),[#63380](https://github.com/PaddlePaddle/Paddle/pull/63380)[#63380](https://github.com/PaddlePaddle/Paddle/pull/63380),[#63275](https://github.com/PaddlePaddle/Paddle/pull/63275),[#63663](https://github.com/PaddlePaddle/Paddle/pull/63663),[#64692](https://github.com/PaddlePaddle/Paddle/pull/64692),[#63958](https://github.com/PaddlePaddle/Paddle/pull/63958) -- Completed the development and validation of OneDNN related basic functions to prepare for the full-scale switch of OneDNN. [#60680](https://github.com/PaddlePaddle/Paddle/pull/60680), [#60665](https://github.com/PaddlePaddle/Paddle/pull/60665), [#63162](https://github.com/PaddlePaddle/Paddle/pull/63162), [#59917](https://github.com/PaddlePaddle/Paddle/pull/59917), [#62901](https://github.com/PaddlePaddle/Paddle/pull/62901), [#59918](https://github.com/PaddlePaddle/Paddle/pull/59918), [#60257](https://github.com/PaddlePaddle/Paddle/pull/60257), [#60502](https://github.com/PaddlePaddle/Paddle/pull/60502), [#61062](https://github.com/PaddlePaddle/Paddle/pull/61062), [#61170](https://github.com/PaddlePaddle/Paddle/pull/61170), [#61474](https://github.com/PaddlePaddle/Paddle/pull/61474), [#60874](https://github.com/PaddlePaddle/Paddle/pull/60874), [#61495](https://github.com/PaddlePaddle/Paddle/pull/61495), [#61664](https://github.com/PaddlePaddle/Paddle/pull/61664), [#61649](https://github.com/PaddlePaddle/Paddle/pull/61649), [#61592](https://github.com/PaddlePaddle/Paddle/pull/61592), [#61667](https://github.com/PaddlePaddle/Paddle/pull/61667), [#61137](https://github.com/PaddlePaddle/Paddle/pull/61137), [#60952](https://github.com/PaddlePaddle/Paddle/pull/60952), [#61651](https://github.com/PaddlePaddle/Paddle/pull/61651), [#62126](https://github.com/PaddlePaddle/Paddle/pull/62126), [#62187](https://github.com/PaddlePaddle/Paddle/pull/62187), [#61307](https://github.com/PaddlePaddle/Paddle/pull/61307), [#62734](https://github.com/PaddlePaddle/Paddle/pull/62734), [#60974](https://github.com/PaddlePaddle/Paddle/pull/60974), [#61451](https://github.com/PaddlePaddle/Paddle/pull/61451), [#61011](https://github.com/PaddlePaddle/Paddle/pull/61011), [#61218](https://github.com/PaddlePaddle/Paddle/pull/61218), [#61623](https://github.com/PaddlePaddle/Paddle/pull/61623), [#61893](https://github.com/PaddlePaddle/Paddle/pull/61893), [#61876](https://github.com/PaddlePaddle/Paddle/pull/61876), [#61892](https://github.com/PaddlePaddle/Paddle/pull/61892), [#62085](https://github.com/PaddlePaddle/Paddle/pull/62085), [#62220](https://github.com/PaddlePaddle/Paddle/pull/62220), [#62244](https://github.com/PaddlePaddle/Paddle/pull/62244), [#62265](https://github.com/PaddlePaddle/Paddle/pull/62265), [#60754](https://github.com/PaddlePaddle/Paddle/pull/60754), [#60896](https://github.com/PaddlePaddle/Paddle/pull/60896), [#61868](https://github.com/PaddlePaddle/Paddle/pull/61868), [#61659](https://github.com/PaddlePaddle/Paddle/pull/61659), [#62241](https://github.com/PaddlePaddle/Paddle/pull/62241), [#62471](https://github.com/PaddlePaddle/Paddle/pull/62471), [#61165](https://github.com/PaddlePaddle/Paddle/pull/61165),[#64441](https://github.com/PaddlePaddle/Paddle/pull/64441),[#63141](https://github.com/PaddlePaddle/Paddle/pull/63141),[#63145](https://github.com/PaddlePaddle/Paddle/pull/63145),[#63592](https://github.com/PaddlePaddle/Paddle/pull/63592),[#63617](https://github.com/PaddlePaddle/Paddle/pull/63617),[#63518](https://github.com/PaddlePaddle/Paddle/pull/63518),[#63726](https://github.com/PaddlePaddle/Paddle/pull/63726),[#63853](https://github.com/PaddlePaddle/Paddle/pull/63853),[#63812](https://github.com/PaddlePaddle/Paddle/pull/63812),[#63811](https://github.com/PaddlePaddle/Paddle/pull/63811),[#64524](https://github.com/PaddlePaddle/Paddle/pull/64524),[#62993](https://github.com/PaddlePaddle/Paddle/pull/62993),[#63516](https://github.com/PaddlePaddle/Paddle/pull/63516),[#62998](https://github.com/PaddlePaddle/Paddle/pull/62998),[#63151](https://github.com/PaddlePaddle/Paddle/pull/63151),[#64661](https://github.com/PaddlePaddle/Paddle/pull/64661),[#64433](https://github.com/PaddlePaddle/Paddle/pull/64433),[#64448](https://github.com/PaddlePaddle/Paddle/pull/64448),[#63201](https://github.com/PaddlePaddle/Paddle/pull/63201),[#63230](https://github.com/PaddlePaddle/Paddle/pull/63230),[#63233](https://github.com/PaddlePaddle/Paddle/pull/63233),[#63281](https://github.com/PaddlePaddle/Paddle/pull/63281),[#64671](https://github.com/PaddlePaddle/Paddle/pull/64671),[#63274](https://github.com/PaddlePaddle/Paddle/pull/63274) -- Implement Sparse related logic based on PIR, including basic Type and operator expression, and complete the verification of Sparse key functions. [#62868](https://github.com/PaddlePaddle/Paddle/pull/62868), [#63015](https://github.com/PaddlePaddle/Paddle/pull/63015), [#62894](https://github.com/PaddlePaddle/Paddle/pull/62894) +### Changes unrelated to ordinary users -### Dynamic-to-static Function Optimization +- Optimize SOT debugging for experience and improve development efficiency. [#67560](https://github.com/PaddlePaddle/Paddle/pull/67560), [#69072](https://github.com/PaddlePaddle/Paddle/pull/69072), [#69837](https://github.com/PaddlePaddle/Paddle/pull/69837), [#70134](https://github.com/PaddlePaddle/Paddle/pull/70134), [#70387](https://github.com/PaddlePaddle/Paddle/pull/70387), [#70740](https://github.com/PaddlePaddle/Paddle/pull/70740), [#71118](https://github.com/PaddlePaddle/Paddle/pull/71118), [#71268](https://github.com/PaddlePaddle/Paddle/pull/71268), [#71275](https://github.com/PaddlePaddle/Paddle/pull/71275), [#71458](https://github.com/PaddlePaddle/Paddle/pull/71458), [#71460](https://github.com/PaddlePaddle/Paddle/pull/71460) +- Other changes unrelated to user usage. [#65393](https://github.com/PaddlePaddle/Paddle/pull/65393), [#65795](https://github.com/PaddlePaddle/Paddle/pull/65795), [#65799](https://github.com/PaddlePaddle/Paddle/pull/65799), [#65911](https://github.com/PaddlePaddle/Paddle/pull/65911), [#65977](https://github.com/PaddlePaddle/Paddle/pull/65977), [#66982](https://github.com/PaddlePaddle/Paddle/pull/66982), [#67563](https://github.com/PaddlePaddle/Paddle/pull/67563), [#68761](https://github.com/PaddlePaddle/Paddle/pull/68761), [#68909](https://github.com/PaddlePaddle/Paddle/pull/68909), [#69130](https://github.com/PaddlePaddle/Paddle/pull/69130), [#69233](https://github.com/PaddlePaddle/Paddle/pull/69233), [#69956](https://github.com/PaddlePaddle/Paddle/pull/69956), [#71142](https://github.com/PaddlePaddle/Paddle/pull/71142) -Optimize the dynamic-to-static basic capability, adapt to the dynamic dimension in SOT training scenarios, and support Python 3.12. +### Security Issues -- Complete the PIR adaptation in dynamic-to-static scenarios. [#60988](https://github.com/PaddlePaddle/Paddle/pull/60988), [#61936](https://github.com/PaddlePaddle/Paddle/pull/61936), [#59929](https://github.com/PaddlePaddle/Paddle/pull/59929), [#61790](https://github.com/PaddlePaddle/Paddle/pull/61790), [#64323](https://github.com/PaddlePaddle/Paddle/pull/64323), [#62030](https://github.com/PaddlePaddle/Paddle/pull/62030), [#61143](https://github.com/PaddlePaddle/Paddle/pull/61143), [#62680](https://github.com/PaddlePaddle/Paddle/pull/62680), [#63309](https://github.com/PaddlePaddle/Paddle/pull/63309), [#63311](https://github.com/PaddlePaddle/Paddle/pull/63311), [#62199](https://github.com/PaddlePaddle/Paddle/pull/62199) -- SOT adapts to Python 3.12 bytecode, and the dynamic-to-static SOT function can be used in Python 3.12. [#61414](https://github.com/PaddlePaddle/Paddle/pull/61414), [#59562](https://github.com/PaddlePaddle/Paddle/pull/59562), [#61031](https://github.com/PaddlePaddle/Paddle/pull/61031), [#61272](https://github.com/PaddlePaddle/Paddle/pull/61272), [#61412](https://github.com/PaddlePaddle/Paddle/pull/61412), [#61305](https://github.com/PaddlePaddle/Paddle/pull/61305), [#61964](https://github.com/PaddlePaddle/Paddle/pull/61964), [#62008](https://github.com/PaddlePaddle/Paddle/pull/62008), [#62028](https://github.com/PaddlePaddle/Paddle/pull/62028), [#61995](https://github.com/PaddlePaddle/Paddle/pull/61995), [#62073](https://github.com/PaddlePaddle/Paddle/pull/62073), [#62120](https://github.com/PaddlePaddle/Paddle/pull/62120), [#62218](https://github.com/PaddlePaddle/Paddle/pull/62218), [#62155](https://github.com/PaddlePaddle/Paddle/pull/62155) -- SOT completes the adaptation of the dynamic dimension of the training scenario, avoiding triggering duplicate graph compositions in dimension changes, and improving the operation efficiency. [#64278](https://github.com/PaddlePaddle/Paddle/pull/64278), [#64435](https://github.com/PaddlePaddle/Paddle/pull/64435), [#64499](https://github.com/PaddlePaddle/Paddle/pull/64499), [#64500](https://github.com/PaddlePaddle/Paddle/pull/64500), [#62080](https://github.com/PaddlePaddle/Paddle/pull/62080) +- Introduced approval rules for IR (Intermediate Representation) save/load operations to enhance security and governance during model serialization. [#65737](https://github.com/PaddlePaddle/Paddle/pull/65737) -### Operator Mechanisms +### Others -For the problems of incomplete implementation of some kernels and inefficient calculation logic, we have improved and optimized some of the operator implementation and internal mechanisms of framework, fixed some known problems, and supported some new features. +- Sparse API migration. [#66139](https://github.com/PaddlePaddle/Paddle/pull/66139), [#66319](https://github.com/PaddlePaddle/Paddle/pull/66319), [#66866](https://github.com/PaddlePaddle/Paddle/pull/66866) +- PIR function extension. [#67966](https://github.com/PaddlePaddle/Paddle/pull/67966), [#69909](https://github.com/PaddlePaddle/Paddle/pull/69909) +- Migrate file locations. [#66477](https://github.com/PaddlePaddle/Paddle/pull/66477), [#66824](https://github.com/PaddlePaddle/Paddle/pull/66824), [#67592](https://github.com/PaddlePaddle/Paddle/pull/67592) +- Log addition. [#68382](https://github.com/PaddlePaddle/Paddle/pull/68382), [#70506](https://github.com/PaddlePaddle/Paddle/pull/70506) +- Enable PIR by default. [#68278](https://github.com/PaddlePaddle/Paddle/pull/68278) +- Header file organization. [#68422](https://github.com/PaddlePaddle/Paddle/pull/68422), [#68471](https://github.com/PaddlePaddle/Paddle/pull/68471) +- Compilation optimization. [#67831](https://github.com/PaddlePaddle/Paddle/pull/67831), [#67821](https://github.com/PaddlePaddle/Paddle/pull/67821), [#68717](https://github.com/PaddlePaddle/Paddle/pull/68717) +- Manage related tests with guards. [#67816](https://github.com/PaddlePaddle/Paddle/pull/67816), [#67827](https://github.com/PaddlePaddle/Paddle/pull/67827), [#67989](https://github.com/PaddlePaddle/Paddle/pull/67989) +- Fixed spelling errors. [#70784](https://github.com/PaddlePaddle/Paddle/pull/70784), [#70787](https://github.com/PaddlePaddle/Paddle/pull/70787) +- Check for CUDA errors. [#70399](https://github.com/PaddlePaddle/Paddle/pull/70399) -- For XPU kernel, we have optimized the data type support of `numel`, `concat`, and `slice`, and the mixed-precision training support for `AdamW` optimizer. [#63715](https://github.com/PaddlePaddle/Paddle/pull/63715), [#61617](https://github.com/PaddlePaddle/Paddle/pull/61617), [#61694](https://github.com/PaddlePaddle/Paddle/pull/61694), [#64542](https://github.com/PaddlePaddle/Paddle/pull/64542), [#63644](https://github.com/PaddlePaddle/Paddle/pull/63644), [#61340](https://github.com/PaddlePaddle/Paddle/pull/61340), [#63108](https://github.com/PaddlePaddle/Paddle/pull/63108) -- Improve the function and performance of some operators. [#59413](https://github.com/PaddlePaddle/Paddle/pull/59413), [#60295](https://github.com/PaddlePaddle/Paddle/pull/60295), [#64304](https://github.com/PaddlePaddle/Paddle/pull/64304), [#60979](https://github.com/PaddlePaddle/Paddle/pull/60979), [#63556](https://github.com/PaddlePaddle/Paddle/pull/63556), [#63061](https://github.com/PaddlePaddle/Paddle/pull/63061), [#62533](https://github.com/PaddlePaddle/Paddle/pull/62533) -- Improve the mechanism of composite operators, and optimize composite logic for some operators. [#59448](https://github.com/PaddlePaddle/Paddle/pull/59448), [#60505](https://github.com/PaddlePaddle/Paddle/pull/60505), [#59891](https://github.com/PaddlePaddle/Paddle/pull/59891), [#63161](https://github.com/PaddlePaddle/Paddle/pull/63161), [#63245](https://github.com/PaddlePaddle/Paddle/pull/63245), [#63782](https://github.com/PaddlePaddle/Paddle/pull/63782), [#64346](https://github.com/PaddlePaddle/Paddle/pull/64346), [#63156](https://github.com/PaddlePaddle/Paddle/pull/63156), [#63171](https://github.com/PaddlePaddle/Paddle/pull/63171), [#61315](https://github.com/PaddlePaddle/Paddle/pull/61315), [#61701](https://github.com/PaddlePaddle/Paddle/pull/61701), [#61874](https://github.com/PaddlePaddle/Paddle/pull/61874), [#61873](https://github.com/PaddlePaddle/Paddle/pull/61873), [#62059](https://github.com/PaddlePaddle/Paddle/pull/62059), [#61912](https://github.com/PaddlePaddle/Paddle/pull/61912), [#62112](https://github.com/PaddlePaddle/Paddle/pull/62112), [#63011](https://github.com/PaddlePaddle/Paddle/pull/63011), [#63009](https://github.com/PaddlePaddle/Paddle/pull/63009), [#64714](https://github.com/PaddlePaddle/Paddle/pull/64714) +### Developer -### Bug Fixing +- Fix issues in dynamic-to-static conversion. Improve overall graph conversion success rate and optimize inference export experience. [#65291](https://github.com/PaddlePaddle/Paddle/pull/65291), [#66153](https://github.com/PaddlePaddle/Paddle/pull/66153), [#66379](https://github.com/PaddlePaddle/Paddle/pull/66379), [#66557](https://github.com/PaddlePaddle/Paddle/pull/66557), [#67021](https://github.com/PaddlePaddle/Paddle/pull/67021), [#67482](https://github.com/PaddlePaddle/Paddle/pull/67482), [#67495](https://github.com/PaddlePaddle/Paddle/pull/67495), [#67981](https://github.com/PaddlePaddle/Paddle/pull/67981), [#68030](https://github.com/PaddlePaddle/Paddle/pull/68030), [#68078](https://github.com/PaddlePaddle/Paddle/pull/68078), [#68328](https://github.com/PaddlePaddle/Paddle/pull/68328), [#68442](https://github.com/PaddlePaddle/Paddle/pull/68442), [#68679](https://github.com/PaddlePaddle/Paddle/pull/68679), [#68850](https://github.com/PaddlePaddle/Paddle/pull/68850), [#68892](https://github.com/PaddlePaddle/Paddle/pull/68892), [#68991](https://github.com/PaddlePaddle/Paddle/pull/68991), [#69043](https://github.com/PaddlePaddle/Paddle/pull/69043), [#69097](https://github.com/PaddlePaddle/Paddle/pull/69097), [#69210](https://github.com/PaddlePaddle/Paddle/pull/69210), [#69295](https://github.com/PaddlePaddle/Paddle/pull/69295), [#69428](https://github.com/PaddlePaddle/Paddle/pull/69428), [#69518](https://github.com/PaddlePaddle/Paddle/pull/69518), [#69642](https://github.com/PaddlePaddle/Paddle/pull/69642), [#69940](https://github.com/PaddlePaddle/Paddle/pull/69940), [#70118](https://github.com/PaddlePaddle/Paddle/pull/70118), [#70169](https://github.com/PaddlePaddle/Paddle/pull/70169), [#70218](https://github.com/PaddlePaddle/Paddle/pull/70218), [#70287](https://github.com/PaddlePaddle/Paddle/pull/70287), [#70412](https://github.com/PaddlePaddle/Paddle/pull/70412), [#71099](https://github.com/PaddlePaddle/Paddle/pull/71099), [#71156](https://github.com/PaddlePaddle/Paddle/pull/71156), [#71193](https://github.com/PaddlePaddle/Paddle/pull/71193), [#71336](https://github.com/PaddlePaddle/Paddle/pull/71336), [#71463](https://github.com/PaddlePaddle/Paddle/pull/71463), [#71476](https://github.com/PaddlePaddle/Paddle/pull/71476), [#71503](https://github.com/PaddlePaddle/Paddle/pull/71503) +- Inplace strategy upgrade. [#65491](https://github.com/PaddlePaddle/Paddle/pull/65491) +- Control flow related development. [#67251](https://github.com/PaddlePaddle/Paddle/pull/67251) +- Add environment variables. [#68467](https://github.com/PaddlePaddle/Paddle/pull/68467) +- Support sparse operator operations. [#67111](https://github.com/PaddlePaddle/Paddle/pull/67111) +- Other execution support development, including logic optimization, version adaptation, and adding unit tests. [#69241](https://github.com/PaddlePaddle/Paddle/pull/69241), [#69806](https://github.com/PaddlePaddle/Paddle/pull/69806), [#70768](https://github.com/PaddlePaddle/Paddle/pull/70768), [#66829](https://github.com/PaddlePaddle/Paddle/pull/66829), [#67110](https://github.com/PaddlePaddle/Paddle/pull/67110), [#67442](https://github.com/PaddlePaddle/Paddle/pull/67442), [#67041](https://github.com/PaddlePaddle/Paddle/pull/67041), [#67452](https://github.com/PaddlePaddle/Paddle/pull/67452), [#69061](https://github.com/PaddlePaddle/Paddle/pull/69061), [#69307](https://github.com/PaddlePaddle/Paddle/pull/69307), [#68669](https://github.com/PaddlePaddle/Paddle/pull/68669), [#69829](https://github.com/PaddlePaddle/Paddle/pull/69829), [#70003](https://github.com/PaddlePaddle/Paddle/pull/70003), [#70443](https://github.com/PaddlePaddle/Paddle/pull/70443), [#70364](https://github.com/PaddlePaddle/Paddle/pull/70364), [#71495](https://github.com/PaddlePaddle/Paddle/pull/71495) -- Fix the bugs related to PIR, actuator, and dynamic-to-static. [#64442](https://github.com/PaddlePaddle/Paddle/pull/64442), [#60443](https://github.com/PaddlePaddle/Paddle/pull/60443), [#60122](https://github.com/PaddlePaddle/Paddle/pull/60122), [#60625](https://github.com/PaddlePaddle/Paddle/pull/60625), [#60607](https://github.com/PaddlePaddle/Paddle/pull/60607), [#60705](https://github.com/PaddlePaddle/Paddle/pull/60705), [#61110](https://github.com/PaddlePaddle/Paddle/pull/61110), [#61278](https://github.com/PaddlePaddle/Paddle/pull/61278), [#61448](https://github.com/PaddlePaddle/Paddle/pull/61448), [#61491](https://github.com/PaddlePaddle/Paddle/pull/61491), [#61692](https://github.com/PaddlePaddle/Paddle/pull/61692), [#62100](https://github.com/PaddlePaddle/Paddle/pull/62100), [#62239](https://github.com/PaddlePaddle/Paddle/pull/62239), [#62365](https://github.com/PaddlePaddle/Paddle/pull/62365), [#62758](https://github.com/PaddlePaddle/Paddle/pull/62758), [#63395](https://github.com/PaddlePaddle/Paddle/pull/63395), [#64272](https://github.com/PaddlePaddle/Paddle/pull/64272), [#62165](https://github.com/PaddlePaddle/Paddle/pull/62165), [#64151](https://github.com/PaddlePaddle/Paddle/pull/64151), [#64204](https://github.com/PaddlePaddle/Paddle/pull/64204), [#64815](https://github.com/PaddlePaddle/Paddle/pull/64815), [#63757](https://github.com/PaddlePaddle/Paddle/pull/63757), [#61972](https://github.com/PaddlePaddle/Paddle/pull/61972), [#64806](https://github.com/PaddlePaddle/Paddle/pull/64806), [#60010](https://github.com/PaddlePaddle/Paddle/pull/60010), [#60461](https://github.com/PaddlePaddle/Paddle/pull/60461), [#60310](https://github.com/PaddlePaddle/Paddle/pull/60310), [#62006](https://github.com/PaddlePaddle/Paddle/pull/62006), [#61591](https://github.com/PaddlePaddle/Paddle/pull/61591), [#60327](https://github.com/PaddlePaddle/Paddle/pull/60327), [#60720](https://github.com/PaddlePaddle/Paddle/pull/60720), [#64656](https://github.com/PaddlePaddle/Paddle/pull/64656), [#60236](https://github.com/PaddlePaddle/Paddle/pull/60236), [#60684](https://github.com/PaddlePaddle/Paddle/pull/60684), [#60790](https://github.com/PaddlePaddle/Paddle/pull/60790), [#60944](https://github.com/PaddlePaddle/Paddle/pull/60944), [#62056](https://github.com/PaddlePaddle/Paddle/pull/62056), [#62891](https://github.com/PaddlePaddle/Paddle/pull/62891), [#64676](https://github.com/PaddlePaddle/Paddle/pull/64676), [#60271](https://github.com/PaddlePaddle/Paddle/pull/60271), [#60634](https://github.com/PaddlePaddle/Paddle/pull/60634), [#60663](https://github.com/PaddlePaddle/Paddle/pull/60663), [#60827](https://github.com/PaddlePaddle/Paddle/pull/60827), [#60845](https://github.com/PaddlePaddle/Paddle/pull/60845), [#60905](https://github.com/PaddlePaddle/Paddle/pull/60905), [#60945](https://github.com/PaddlePaddle/Paddle/pull/60945), [#60949](https://github.com/PaddlePaddle/Paddle/pull/60949), [#61107](https://github.com/PaddlePaddle/Paddle/pull/61107), [#61111](https://github.com/PaddlePaddle/Paddle/pull/61111), [#61117](https://github.com/PaddlePaddle/Paddle/pull/61117), [#61158](https://github.com/PaddlePaddle/Paddle/pull/61158), [#61177](https://github.com/PaddlePaddle/Paddle/pull/61177), [#61355](https://github.com/PaddlePaddle/Paddle/pull/61355), [#61593](https://github.com/PaddlePaddle/Paddle/pull/61593), [#61666](https://github.com/PaddlePaddle/Paddle/pull/61666), [#61934](https://github.com/PaddlePaddle/Paddle/pull/61934), [#62216](https://github.com/PaddlePaddle/Paddle/pull/62216), [#62491](https://github.com/PaddlePaddle/Paddle/pull/62491), [#62515](https://github.com/PaddlePaddle/Paddle/pull/62515), [#62594](https://github.com/PaddlePaddle/Paddle/pull/62594), [#62605](https://github.com/PaddlePaddle/Paddle/pull/62605), [#62895](https://github.com/PaddlePaddle/Paddle/pull/62895), [#62913](https://github.com/PaddlePaddle/Paddle/pull/62913), [#64413](https://github.com/PaddlePaddle/Paddle/pull/64413), [#59947](https://github.com/PaddlePaddle/Paddle/pull/59947), [#60264](https://github.com/PaddlePaddle/Paddle/pull/60264), [#60721](https://github.com/PaddlePaddle/Paddle/pull/60721), [#63113](https://github.com/PaddlePaddle/Paddle/pull/63113), [#63629](https://github.com/PaddlePaddle/Paddle/pull/63629), [#64300](https://github.com/PaddlePaddle/Paddle/pull/64300), [#64450](https://github.com/PaddlePaddle/Paddle/pull/64450), [#64532](https://github.com/PaddlePaddle/Paddle/pull/64532), [#64561](https://github.com/PaddlePaddle/Paddle/pull/64561), [#64625](https://github.com/PaddlePaddle/Paddle/pull/64625), [#64731](https://github.com/PaddlePaddle/Paddle/pull/64731), [#60059](https://github.com/PaddlePaddle/Paddle/pull/60059), [#60487](https://github.com/PaddlePaddle/Paddle/pull/60487), [#60423](https://github.com/PaddlePaddle/Paddle/pull/60423), [#61599](https://github.com/PaddlePaddle/Paddle/pull/61599), [#62032](https://github.com/PaddlePaddle/Paddle/pull/62032), [#62686](https://github.com/PaddlePaddle/Paddle/pull/62686), [#64055](https://github.com/PaddlePaddle/Paddle/pull/64055), [#60751](https://github.com/PaddlePaddle/Paddle/pull/60751), [#61646](https://github.com/PaddlePaddle/Paddle/pull/61646), [#60454](https://github.com/PaddlePaddle/Paddle/pull/60454), [#62530](https://github.com/PaddlePaddle/Paddle/pull/62530), [#62821](https://github.com/PaddlePaddle/Paddle/pull/62821), [#64454](https://github.com/PaddlePaddle/Paddle/pull/64454), [#64754](https://github.com/PaddlePaddle/Paddle/pull/64754), [#59860](https://github.com/PaddlePaddle/Paddle/pull/59860), [#60280](https://github.com/PaddlePaddle/Paddle/pull/60280), [#60357](https://github.com/PaddlePaddle/Paddle/pull/60357), [#60363](https://github.com/PaddlePaddle/Paddle/pull/60363), [#60900](https://github.com/PaddlePaddle/Paddle/pull/60900), [#61185](https://github.com/PaddlePaddle/Paddle/pull/61185), [#61505](https://github.com/PaddlePaddle/Paddle/pull/61505), [#61644](https://github.com/PaddlePaddle/Paddle/pull/61644), [#62256](https://github.com/PaddlePaddle/Paddle/pull/62256), [#62396](https://github.com/PaddlePaddle/Paddle/pull/62396), [#63040](https://github.com/PaddlePaddle/Paddle/pull/63040), [#63409](https://github.com/PaddlePaddle/Paddle/pull/63409), [#63764](https://github.com/PaddlePaddle/Paddle/pull/63764), [#59571](https://github.com/PaddlePaddle/Paddle/pull/59571), [#59894](https://github.com/PaddlePaddle/Paddle/pull/59894), [#59569](https://github.com/PaddlePaddle/Paddle/pull/59569), [#59896](https://github.com/PaddlePaddle/Paddle/pull/59896), [#60015](https://github.com/PaddlePaddle/Paddle/pull/60015), [#60081](https://github.com/PaddlePaddle/Paddle/pull/60081), [#60164](https://github.com/PaddlePaddle/Paddle/pull/60164), [#60200](https://github.com/PaddlePaddle/Paddle/pull/60200), [#60211](https://github.com/PaddlePaddle/Paddle/pull/60211), [#60267](https://github.com/PaddlePaddle/Paddle/pull/60267), [#60458](https://github.com/PaddlePaddle/Paddle/pull/60458), [#60395](https://github.com/PaddlePaddle/Paddle/pull/60395), [#60907](https://github.com/PaddlePaddle/Paddle/pull/60907), [#60707](https://github.com/PaddlePaddle/Paddle/pull/60707), [#60993](https://github.com/PaddlePaddle/Paddle/pull/60993), [#61401](https://github.com/PaddlePaddle/Paddle/pull/61401), [#61433](https://github.com/PaddlePaddle/Paddle/pull/61433), [#61450](https://github.com/PaddlePaddle/Paddle/pull/61450), [#61577](https://github.com/PaddlePaddle/Paddle/pull/61577), [#61575](https://github.com/PaddlePaddle/Paddle/pull/61575), [#61703](https://github.com/PaddlePaddle/Paddle/pull/61703), [#61711](https://github.com/PaddlePaddle/Paddle/pull/61711), [#61883](https://github.com/PaddlePaddle/Paddle/pull/61883), [#61822](https://github.com/PaddlePaddle/Paddle/pull/61822), [#62012](https://github.com/PaddlePaddle/Paddle/pull/62012), [#61858](https://github.com/PaddlePaddle/Paddle/pull/61858), [#62176](https://github.com/PaddlePaddle/Paddle/pull/62176), [#62257](https://github.com/PaddlePaddle/Paddle/pull/62257), [#62470](https://github.com/PaddlePaddle/Paddle/pull/62470), [#62536](https://github.com/PaddlePaddle/Paddle/pull/62536), [#62606](https://github.com/PaddlePaddle/Paddle/pull/62606), [#62808](https://github.com/PaddlePaddle/Paddle/pull/62808), [#62854](https://github.com/PaddlePaddle/Paddle/pull/62854), [#62879](https://github.com/PaddlePaddle/Paddle/pull/62879), [#62864](https://github.com/PaddlePaddle/Paddle/pull/62864), [#63063](https://github.com/PaddlePaddle/Paddle/pull/63063), [#62958](https://github.com/PaddlePaddle/Paddle/pull/62958), [#63397](https://github.com/PaddlePaddle/Paddle/pull/63397), [#63805](https://github.com/PaddlePaddle/Paddle/pull/63805), [#63694](https://github.com/PaddlePaddle/Paddle/pull/63694), [#64168](https://github.com/PaddlePaddle/Paddle/pull/64168), [#64184](https://github.com/PaddlePaddle/Paddle/pull/64184), [#64174](https://github.com/PaddlePaddle/Paddle/pull/64174), [#64315](https://github.com/PaddlePaddle/Paddle/pull/64315), [#64362](https://github.com/PaddlePaddle/Paddle/pull/64362), [#64400](https://github.com/PaddlePaddle/Paddle/pull/64400), [#64475](https://github.com/PaddlePaddle/Paddle/pull/64475), [#64458](https://github.com/PaddlePaddle/Paddle/pull/64458), [#64548](https://github.com/PaddlePaddle/Paddle/pull/64548), [#59858](https://github.com/PaddlePaddle/Paddle/pull/59858), [#61132](https://github.com/PaddlePaddle/Paddle/pull/61132), [#62010](https://github.com/PaddlePaddle/Paddle/pull/62010), [#62069](https://github.com/PaddlePaddle/Paddle/pull/62069), [#62707](https://github.com/PaddlePaddle/Paddle/pull/62707), [#62921](https://github.com/PaddlePaddle/Paddle/pull/62921), [#63085](https://github.com/PaddlePaddle/Paddle/pull/63085), [#63321](https://github.com/PaddlePaddle/Paddle/pull/63321), [#63351](https://github.com/PaddlePaddle/Paddle/pull/63351), [#63549](https://github.com/PaddlePaddle/Paddle/pull/63549), [#64567](https://github.com/PaddlePaddle/Paddle/pull/64567), [#59936](https://github.com/PaddlePaddle/Paddle/pull/59936), [#60269](https://github.com/PaddlePaddle/Paddle/pull/60269), [#60879](https://github.com/PaddlePaddle/Paddle/pull/60879), [#61314](https://github.com/PaddlePaddle/Paddle/pull/61314), [#61391](https://github.com/PaddlePaddle/Paddle/pull/61391), [#61479](https://github.com/PaddlePaddle/Paddle/pull/61479), [#61789](https://github.com/PaddlePaddle/Paddle/pull/61789), [#61832](https://github.com/PaddlePaddle/Paddle/pull/61832), [#61864](https://github.com/PaddlePaddle/Paddle/pull/61864), [#61917](https://github.com/PaddlePaddle/Paddle/pull/61917), [#62052](https://github.com/PaddlePaddle/Paddle/pull/62052), [#62068](https://github.com/PaddlePaddle/Paddle/pull/62068), [#62293](https://github.com/PaddlePaddle/Paddle/pull/62293), [#62479](https://github.com/PaddlePaddle/Paddle/pull/62479), [#62506](https://github.com/PaddlePaddle/Paddle/pull/62506), [#59948](https://github.com/PaddlePaddle/Paddle/pull/59948), [#64118](https://github.com/PaddlePaddle/Paddle/pull/64118), [#64126](https://github.com/PaddlePaddle/Paddle/pull/64126), [#64195](https://github.com/PaddlePaddle/Paddle/pull/64195), [#64307](https://github.com/PaddlePaddle/Paddle/pull/64307), [#64314](https://github.com/PaddlePaddle/Paddle/pull/64314), [#64276](https://github.com/PaddlePaddle/Paddle/pull/64276), [#64312](https://github.com/PaddlePaddle/Paddle/pull/64312), [#64350](https://github.com/PaddlePaddle/Paddle/pull/64350), [#64319](https://github.com/PaddlePaddle/Paddle/pull/64319), [#64463](https://github.com/PaddlePaddle/Paddle/pull/64463), [#64457](https://github.com/PaddlePaddle/Paddle/pull/64457), [#64455](https://github.com/PaddlePaddle/Paddle/pull/64455), [#64487](https://github.com/PaddlePaddle/Paddle/pull/64487), [#64645](https://github.com/PaddlePaddle/Paddle/pull/64645), [#63155](https://github.com/PaddlePaddle/Paddle/pull/63155), [#59893](https://github.com/PaddlePaddle/Paddle/pull/59893), [#63332](https://github.com/PaddlePaddle/Paddle/pull/63332), [#63332](https://github.com/PaddlePaddle/Paddle/pull/63332), [#64786](https://github.com/PaddlePaddle/Paddle/pull/64786), [#60515](https://github.com/PaddlePaddle/Paddle/pull/60515), [#60627](https://github.com/PaddlePaddle/Paddle/pull/60627), [#60863](https://github.com/PaddlePaddle/Paddle/pull/60863), [#60854](https://github.com/PaddlePaddle/Paddle/pull/60854), [#61447](https://github.com/PaddlePaddle/Paddle/pull/61447), [#61440](https://github.com/PaddlePaddle/Paddle/pull/61440), [#61932](https://github.com/PaddlePaddle/Paddle/pull/61932), [#62131](https://github.com/PaddlePaddle/Paddle/pull/62131), [#62252](https://github.com/PaddlePaddle/Paddle/pull/62252), [#62283](https://github.com/PaddlePaddle/Paddle/pull/62283), [#62358](https://github.com/PaddlePaddle/Paddle/pull/62358), [#62411](https://github.com/PaddlePaddle/Paddle/pull/62411), [#62424](https://github.com/PaddlePaddle/Paddle/pull/62424), [#62810](https://github.com/PaddlePaddle/Paddle/pull/62810), [#62811](https://github.com/PaddlePaddle/Paddle/pull/62811), [#62896](https://github.com/PaddlePaddle/Paddle/pull/62896), [#62947](https://github.com/PaddlePaddle/Paddle/pull/62947), [#63182](https://github.com/PaddlePaddle/Paddle/pull/63182), [#63190](https://github.com/PaddlePaddle/Paddle/pull/63190), [#63294](https://github.com/PaddlePaddle/Paddle/pull/63294), [#63306](https://github.com/PaddlePaddle/Paddle/pull/63306), [#63352](https://github.com/PaddlePaddle/Paddle/pull/63352), [#63404](https://github.com/PaddlePaddle/Paddle/pull/63404), [#63474](https://github.com/PaddlePaddle/Paddle/pull/63474), [#64013](https://github.com/PaddlePaddle/Paddle/pull/64013), [#64674](https://github.com/PaddlePaddle/Paddle/pull/64674),[#60055](https://github.com/PaddlePaddle/Paddle/pull/60055),[#62050](https://github.com/PaddlePaddle/Paddle/pull/62050),[#62770](https://github.com/PaddlePaddle/Paddle/pull/62770),[#63234](https://github.com/PaddlePaddle/Paddle/pull/63234),[#63374](https://github.com/PaddlePaddle/Paddle/pull/63374),[#64277](https://github.com/PaddlePaddle/Paddle/pull/64277), [#63420](https://github.com/PaddlePaddle/Paddle/pull/63420), [#60312](https://github.com/PaddlePaddle/Paddle/pull/60312), [#63810](https://github.com/PaddlePaddle/Paddle/pull/63810), [#64631](https://github.com/PaddlePaddle/Paddle/pull/64631), [#63970](https://github.com/PaddlePaddle/Paddle/pull/63970), [#63708](https://github.com/PaddlePaddle/Paddle/pull/63708), [#62062](https://github.com/PaddlePaddle/Paddle/pull/62062), [#60898](https://github.com/PaddlePaddle/Paddle/pull/60898), [#62373](https://github.com/PaddlePaddle/Paddle/pull/62373), [#59878](https://github.com/PaddlePaddle/Paddle/pull/59878) -- Fix some bugs in operator mechanism, operator implementation logic and related unit tests. [#63792](https://github.com/PaddlePaddle/Paddle/pull/63792), [#60570](https://github.com/PaddlePaddle/Paddle/pull/60570), [#61572](https://github.com/PaddlePaddle/Paddle/pull/61572), [#59971](https://github.com/PaddlePaddle/Paddle/pull/59971), [#61336](https://github.com/PaddlePaddle/Paddle/pull/61336), [#63276](https://github.com/PaddlePaddle/Paddle/pull/63276), [#63251](https://github.com/PaddlePaddle/Paddle/pull/63251), [#63697](https://github.com/PaddlePaddle/Paddle/pull/63697), [#63706](https://github.com/PaddlePaddle/Paddle/pull/63706), [#64685](https://github.com/PaddlePaddle/Paddle/pull/64685), [#64009](https://github.com/PaddlePaddle/Paddle/pull/64009), [#62461](https://github.com/PaddlePaddle/Paddle/pull/62461), [#61568](https://github.com/PaddlePaddle/Paddle/pull/61568), [#63912](https://github.com/PaddlePaddle/Paddle/pull/63912), [#60475](https://github.com/PaddlePaddle/Paddle/pull/60475), [#60222](https://github.com/PaddlePaddle/Paddle/pull/60222), [#63961](https://github.com/PaddlePaddle/Paddle/pull/63961), [#63593](https://github.com/PaddlePaddle/Paddle/pull/63593) +### Performance optimization -### Developer Content +- Optimize dynamic shape handling in static graph conversion, reducing graph construction iterations and compilation time. [#65235](https://github.com/PaddlePaddle/Paddle/pull/65235), [#65477](https://github.com/PaddlePaddle/Paddle/pull/65477), [#65517](https://github.com/PaddlePaddle/Paddle/pull/65517), [#65882](https://github.com/PaddlePaddle/Paddle/pull/65882), [#66346](https://github.com/PaddlePaddle/Paddle/pull/66346), [#66746](https://github.com/PaddlePaddle/Paddle/pull/66746), [#67786](https://github.com/PaddlePaddle/Paddle/pull/67786), [#67876](https://github.com/PaddlePaddle/Paddle/pull/67876), [#68113](https://github.com/PaddlePaddle/Paddle/pull/68113), [#68302](https://github.com/PaddlePaddle/Paddle/pull/68302), [#68337](https://github.com/PaddlePaddle/Paddle/pull/68337), [#68616](https://github.com/PaddlePaddle/Paddle/pull/68616), [#69354](https://github.com/PaddlePaddle/Paddle/pull/69354), [#70009](https://github.com/PaddlePaddle/Paddle/pull/70009), [#70877](https://github.com/PaddlePaddle/Paddle/pull/70877) +- End-to-end performance optimization for SOT, minimizing subgraph fragmentation, reducing scheduling overhead, and improving static training efficiency. [#67591](https://github.com/PaddlePaddle/Paddle/pull/67591), [#67746](https://github.com/PaddlePaddle/Paddle/pull/67746), [#67823](https://github.com/PaddlePaddle/Paddle/pull/67823), [#67890](https://github.com/PaddlePaddle/Paddle/pull/67890), [#67921](https://github.com/PaddlePaddle/Paddle/pull/67921), [#68031](https://github.com/PaddlePaddle/Paddle/pull/68031), [#68153](https://github.com/PaddlePaddle/Paddle/pull/68153), [#68729](https://github.com/PaddlePaddle/Paddle/pull/68729), [#69249](https://github.com/PaddlePaddle/Paddle/pull/69249), [#69263](https://github.com/PaddlePaddle/Paddle/pull/69263), [#69300](https://github.com/PaddlePaddle/Paddle/pull/69300), [#69313](https://github.com/PaddlePaddle/Paddle/pull/69313), [#69325](https://github.com/PaddlePaddle/Paddle/pull/69325), [#69353](https://github.com/PaddlePaddle/Paddle/pull/69353), [#69411](https://github.com/PaddlePaddle/Paddle/pull/69411), [#69506](https://github.com/PaddlePaddle/Paddle/pull/69506), [#69672](https://github.com/PaddlePaddle/Paddle/pull/69672), [#69746](https://github.com/PaddlePaddle/Paddle/pull/69746), [#69834](https://github.com/PaddlePaddle/Paddle/pull/69834), [#69836](https://github.com/PaddlePaddle/Paddle/pull/69836), [#69852](https://github.com/PaddlePaddle/Paddle/pull/69852), [#69975](https://github.com/PaddlePaddle/Paddle/pull/69975), [#70151](https://github.com/PaddlePaddle/Paddle/pull/70151), [#70293](https://github.com/PaddlePaddle/Paddle/pull/70293), [#70405](https://github.com/PaddlePaddle/Paddle/pull/70405), [#70851](https://github.com/PaddlePaddle/Paddle/pull/70851), [#71039](https://github.com/PaddlePaddle/Paddle/pull/71039), [#71254](https://github.com/PaddlePaddle/Paddle/pull/71254), [#71295](https://github.com/PaddlePaddle/Paddle/pull/71295), [#71298](https://github.com/PaddlePaddle/Paddle/pull/71298), [#71346](https://github.com/PaddlePaddle/Paddle/pull/71346), [#71377](https://github.com/PaddlePaddle/Paddle/pull/71377), [#71407](https://github.com/PaddlePaddle/Paddle/pull/71407) +- Optimize the performance of dynamic shape scenarios. [#68491](https://github.com/PaddlePaddle/Paddle/pull/68491), [#68629](https://github.com/PaddlePaddle/Paddle/pull/68629) +- Accelerate the execution speed of PIR executor. [#69513](https://github.com/PaddlePaddle/Paddle/pull/69513) +- Optimize PIR saving and loading performance. [#69683](https://github.com/PaddlePaddle/Paddle/pull/69683) +- Optimize for device. [#69676](https://github.com/PaddlePaddle/Paddle/pull/69676) +- Clean up redundant input and output information. [#66278](https://github.com/PaddlePaddle/Paddle/pull/66278) -- Developer related contents include PIR switching, unit test start, function verification and other PR. [#60621](https://github.com/PaddlePaddle/Paddle/pull/60621), [#59703](https://github.com/PaddlePaddle/Paddle/pull/59703), [#59694](https://github.com/PaddlePaddle/Paddle/pull/59694), [#59717](https://github.com/PaddlePaddle/Paddle/pull/59717), [#59729](https://github.com/PaddlePaddle/Paddle/pull/59729), [#59730](https://github.com/PaddlePaddle/Paddle/pull/59730), [#60216](https://github.com/PaddlePaddle/Paddle/pull/60216), [#60238](https://github.com/PaddlePaddle/Paddle/pull/60238), [#60246](https://github.com/PaddlePaddle/Paddle/pull/60246), [#60343](https://github.com/PaddlePaddle/Paddle/pull/60343), [#60302](https://github.com/PaddlePaddle/Paddle/pull/60302), [#60870](https://github.com/PaddlePaddle/Paddle/pull/60870), [#59956](https://github.com/PaddlePaddle/Paddle/pull/59956), [#60795](https://github.com/PaddlePaddle/Paddle/pull/60795), [#62528](https://github.com/PaddlePaddle/Paddle/pull/62528), [#59932](https://github.com/PaddlePaddle/Paddle/pull/59932), [#59636](https://github.com/PaddlePaddle/Paddle/pull/59636), [#59959](https://github.com/PaddlePaddle/Paddle/pull/59959), [#59734](https://github.com/PaddlePaddle/Paddle/pull/59734), [#60287](https://github.com/PaddlePaddle/Paddle/pull/60287), [#60347](https://github.com/PaddlePaddle/Paddle/pull/60347), [#60335](https://github.com/PaddlePaddle/Paddle/pull/60335), [#60332](https://github.com/PaddlePaddle/Paddle/pull/60332), [#59631](https://github.com/PaddlePaddle/Paddle/pull/59631), [#60255](https://github.com/PaddlePaddle/Paddle/pull/60255), [#60329](https://github.com/PaddlePaddle/Paddle/pull/60329), [#60401](https://github.com/PaddlePaddle/Paddle/pull/60401), [#60522](https://github.com/PaddlePaddle/Paddle/pull/60522), [#60792](https://github.com/PaddlePaddle/Paddle/pull/60792), [#59617](https://github.com/PaddlePaddle/Paddle/pull/59617), [#60277](https://github.com/PaddlePaddle/Paddle/pull/60277), [#60584](https://github.com/PaddlePaddle/Paddle/pull/60584), [#60911](https://github.com/PaddlePaddle/Paddle/pull/60911), [#61322](https://github.com/PaddlePaddle/Paddle/pull/61322), [#60838](https://github.com/PaddlePaddle/Paddle/pull/60838), [#60602](https://github.com/PaddlePaddle/Paddle/pull/60602), [#61458](https://github.com/PaddlePaddle/Paddle/pull/61458), [#61607](https://github.com/PaddlePaddle/Paddle/pull/61607), [#61960](https://github.com/PaddlePaddle/Paddle/pull/61960), [#60484](https://github.com/PaddlePaddle/Paddle/pull/60484), [#61662](https://github.com/PaddlePaddle/Paddle/pull/61662), [#62263](https://github.com/PaddlePaddle/Paddle/pull/62263), [#62270](https://github.com/PaddlePaddle/Paddle/pull/62270), [#62469](https://github.com/PaddlePaddle/Paddle/pull/62469), [#62416](https://github.com/PaddlePaddle/Paddle/pull/62416), [#62443](https://github.com/PaddlePaddle/Paddle/pull/62443), [#62412](https://github.com/PaddlePaddle/Paddle/pull/62412), [#62541](https://github.com/PaddlePaddle/Paddle/pull/62541), [#62634](https://github.com/PaddlePaddle/Paddle/pull/62634), [#62369](https://github.com/PaddlePaddle/Paddle/pull/62369), [#60805](https://github.com/PaddlePaddle/Paddle/pull/60805), [#62644](https://github.com/PaddlePaddle/Paddle/pull/62644), [#62494](https://github.com/PaddlePaddle/Paddle/pull/62494), [#62767](https://github.com/PaddlePaddle/Paddle/pull/62767), [#62735](https://github.com/PaddlePaddle/Paddle/pull/62735), [#62802](https://github.com/PaddlePaddle/Paddle/pull/62802), [#62801](https://github.com/PaddlePaddle/Paddle/pull/62801), [#62783](https://github.com/PaddlePaddle/Paddle/pull/62783), [#62579](https://github.com/PaddlePaddle/Paddle/pull/62579), [#62833](https://github.com/PaddlePaddle/Paddle/pull/62833), [#62668](https://github.com/PaddlePaddle/Paddle/pull/62668), [#62972](https://github.com/PaddlePaddle/Paddle/pull/62972), [#62505](https://github.com/PaddlePaddle/Paddle/pull/62505), [#63005](https://github.com/PaddlePaddle/Paddle/pull/63005), [#62900](https://github.com/PaddlePaddle/Paddle/pull/62900), [#60577](https://github.com/PaddlePaddle/Paddle/pull/60577), [#60877](https://github.com/PaddlePaddle/Paddle/pull/60877), [#61076](https://github.com/PaddlePaddle/Paddle/pull/61076), [#61038](https://github.com/PaddlePaddle/Paddle/pull/61038), [#61112](https://github.com/PaddlePaddle/Paddle/pull/61112), [#61120](https://github.com/PaddlePaddle/Paddle/pull/61120), [#61582](https://github.com/PaddlePaddle/Paddle/pull/61582), [#61119](https://github.com/PaddlePaddle/Paddle/pull/61119), [#61036](https://github.com/PaddlePaddle/Paddle/pull/61036), [#61289](https://github.com/PaddlePaddle/Paddle/pull/61289), [#60695](https://github.com/PaddlePaddle/Paddle/pull/60695), [#61039](https://github.com/PaddlePaddle/Paddle/pull/61039), [#61963](https://github.com/PaddlePaddle/Paddle/pull/61963), [#62118](https://github.com/PaddlePaddle/Paddle/pull/62118), [#62797](https://github.com/PaddlePaddle/Paddle/pull/62797), [#62807](https://github.com/PaddlePaddle/Paddle/pull/62807), [#62887](https://github.com/PaddlePaddle/Paddle/pull/62887), [#62830](https://github.com/PaddlePaddle/Paddle/pull/62830), [#62849](https://github.com/PaddlePaddle/Paddle/pull/62849), [#62750](https://github.com/PaddlePaddle/Paddle/pull/62750), [#62965](https://github.com/PaddlePaddle/Paddle/pull/62965), [#59742](https://github.com/PaddlePaddle/Paddle/pull/59742), [#59867](https://github.com/PaddlePaddle/Paddle/pull/59867), [#60836](https://github.com/PaddlePaddle/Paddle/pull/60836), [#60902](https://github.com/PaddlePaddle/Paddle/pull/60902), [#61228](https://github.com/PaddlePaddle/Paddle/pull/61228), [#60037](https://github.com/PaddlePaddle/Paddle/pull/60037), [#60079](https://github.com/PaddlePaddle/Paddle/pull/60079), [#60173](https://github.com/PaddlePaddle/Paddle/pull/60173), [#60373](https://github.com/PaddlePaddle/Paddle/pull/60373), [#60380](https://github.com/PaddlePaddle/Paddle/pull/60380), [#60381](https://github.com/PaddlePaddle/Paddle/pull/60381), [#60750](https://github.com/PaddlePaddle/Paddle/pull/60750), [#61065](https://github.com/PaddlePaddle/Paddle/pull/61065), [#61122](https://github.com/PaddlePaddle/Paddle/pull/61122), [#61074](https://github.com/PaddlePaddle/Paddle/pull/61074), [#61204](https://github.com/PaddlePaddle/Paddle/pull/61204), [#61191](https://github.com/PaddlePaddle/Paddle/pull/61191), [#61182](https://github.com/PaddlePaddle/Paddle/pull/61182), [#61219](https://github.com/PaddlePaddle/Paddle/pull/61219), [#61296](https://github.com/PaddlePaddle/Paddle/pull/61296), [#61503](https://github.com/PaddlePaddle/Paddle/pull/61503), [#61484](https://github.com/PaddlePaddle/Paddle/pull/61484), [#61513](https://github.com/PaddlePaddle/Paddle/pull/61513), [#61476](https://github.com/PaddlePaddle/Paddle/pull/61476), [#61510](https://github.com/PaddlePaddle/Paddle/pull/61510), [#61511](https://github.com/PaddlePaddle/Paddle/pull/61511), [#61526](https://github.com/PaddlePaddle/Paddle/pull/61526), [#61524](https://github.com/PaddlePaddle/Paddle/pull/61524), [#61525](https://github.com/PaddlePaddle/Paddle/pull/61525), [#61466](https://github.com/PaddlePaddle/Paddle/pull/61466), [#61497](https://github.com/PaddlePaddle/Paddle/pull/61497), [#61538](https://github.com/PaddlePaddle/Paddle/pull/61538), [#61533](https://github.com/PaddlePaddle/Paddle/pull/61533), [#61530](https://github.com/PaddlePaddle/Paddle/pull/61530), [#61468](https://github.com/PaddlePaddle/Paddle/pull/61468), [#61527](https://github.com/PaddlePaddle/Paddle/pull/61527), [#61535](https://github.com/PaddlePaddle/Paddle/pull/61535), [#61512](https://github.com/PaddlePaddle/Paddle/pull/61512), [#61531](https://github.com/PaddlePaddle/Paddle/pull/61531), [#61539](https://github.com/PaddlePaddle/Paddle/pull/61539), [#61532](https://github.com/PaddlePaddle/Paddle/pull/61532), [#61521](https://github.com/PaddlePaddle/Paddle/pull/61521), [#61517](https://github.com/PaddlePaddle/Paddle/pull/61517), [#61518](https://github.com/PaddlePaddle/Paddle/pull/61518), [#61550](https://github.com/PaddlePaddle/Paddle/pull/61550), [#61545](https://github.com/PaddlePaddle/Paddle/pull/61545), [#61548](https://github.com/PaddlePaddle/Paddle/pull/61548), [#61519](https://github.com/PaddlePaddle/Paddle/pull/61519), [#61549](https://github.com/PaddlePaddle/Paddle/pull/61549), [#61574](https://github.com/PaddlePaddle/Paddle/pull/61574), [#61585](https://github.com/PaddlePaddle/Paddle/pull/61585), [#61581](https://github.com/PaddlePaddle/Paddle/pull/61581), [#61553](https://github.com/PaddlePaddle/Paddle/pull/61553), [#61504](https://github.com/PaddlePaddle/Paddle/pull/61504), [#61603](https://github.com/PaddlePaddle/Paddle/pull/61603), [#61534](https://github.com/PaddlePaddle/Paddle/pull/61534), [#61567](https://github.com/PaddlePaddle/Paddle/pull/61567), [#61523](https://github.com/PaddlePaddle/Paddle/pull/61523), [#61565](https://github.com/PaddlePaddle/Paddle/pull/61565), [#61564](https://github.com/PaddlePaddle/Paddle/pull/61564), [#61707](https://github.com/PaddlePaddle/Paddle/pull/61707), [#61560](https://github.com/PaddlePaddle/Paddle/pull/61560), [#61684](https://github.com/PaddlePaddle/Paddle/pull/61684), [#61706](https://github.com/PaddlePaddle/Paddle/pull/61706), [#61724](https://github.com/PaddlePaddle/Paddle/pull/61724), [#61719](https://github.com/PaddlePaddle/Paddle/pull/61719), [#61729](https://github.com/PaddlePaddle/Paddle/pull/61729), [#61763](https://github.com/PaddlePaddle/Paddle/pull/61763), [#61755](https://github.com/PaddlePaddle/Paddle/pull/61755), [#61737](https://github.com/PaddlePaddle/Paddle/pull/61737), [#61750](https://github.com/PaddlePaddle/Paddle/pull/61750), [#61753](https://github.com/PaddlePaddle/Paddle/pull/61753), [#61756](https://github.com/PaddlePaddle/Paddle/pull/61756), [#61777](https://github.com/PaddlePaddle/Paddle/pull/61777), [#61758](https://github.com/PaddlePaddle/Paddle/pull/61758), [#61731](https://github.com/PaddlePaddle/Paddle/pull/61731), [#61771](https://github.com/PaddlePaddle/Paddle/pull/61771), [#61739](https://github.com/PaddlePaddle/Paddle/pull/61739), [#61559](https://github.com/PaddlePaddle/Paddle/pull/61559), [#61717](https://github.com/PaddlePaddle/Paddle/pull/61717), [#61733](https://github.com/PaddlePaddle/Paddle/pull/61733), [#61563](https://github.com/PaddlePaddle/Paddle/pull/61563), [#61546](https://github.com/PaddlePaddle/Paddle/pull/61546), [#61566](https://github.com/PaddlePaddle/Paddle/pull/61566), [#61562](https://github.com/PaddlePaddle/Paddle/pull/61562), [#61793](https://github.com/PaddlePaddle/Paddle/pull/61793), [#61902](https://github.com/PaddlePaddle/Paddle/pull/61902), [#61905](https://github.com/PaddlePaddle/Paddle/pull/61905), [#61904](https://github.com/PaddlePaddle/Paddle/pull/61904), [#62227](https://github.com/PaddlePaddle/Paddle/pull/62227), [#62332](https://github.com/PaddlePaddle/Paddle/pull/62332), [#62653](https://github.com/PaddlePaddle/Paddle/pull/62653), [#62681](https://github.com/PaddlePaddle/Paddle/pull/62681), [#62709](https://github.com/PaddlePaddle/Paddle/pull/62709), [#62794](https://github.com/PaddlePaddle/Paddle/pull/62794), [#62938](https://github.com/PaddlePaddle/Paddle/pull/62938), [#63185](https://github.com/PaddlePaddle/Paddle/pull/63185), [#63754](https://github.com/PaddlePaddle/Paddle/pull/63754), [#63769](https://github.com/PaddlePaddle/Paddle/pull/63769), [#63793](https://github.com/PaddlePaddle/Paddle/pull/63793), [#63830](https://github.com/PaddlePaddle/Paddle/pull/63830), [#63939](https://github.com/PaddlePaddle/Paddle/pull/63939), [#64340](https://github.com/PaddlePaddle/Paddle/pull/64340), [#64657](https://github.com/PaddlePaddle/Paddle/pull/64657), [#62527](https://github.com/PaddlePaddle/Paddle/pull/62527), [#64088](https://github.com/PaddlePaddle/Paddle/pull/64088), [#60203](https://github.com/PaddlePaddle/Paddle/pull/60203), [#60372](https://github.com/PaddlePaddle/Paddle/pull/60372), [#60685](https://github.com/PaddlePaddle/Paddle/pull/60685), [#60815](https://github.com/PaddlePaddle/Paddle/pull/60815), [#60791](https://github.com/PaddlePaddle/Paddle/pull/60791), [#60864](https://github.com/PaddlePaddle/Paddle/pull/60864), [#60851](https://github.com/PaddlePaddle/Paddle/pull/60851), [#60844](https://github.com/PaddlePaddle/Paddle/pull/60844), [#60694](https://github.com/PaddlePaddle/Paddle/pull/60694), [#60855](https://github.com/PaddlePaddle/Paddle/pull/60855), [#60869](https://github.com/PaddlePaddle/Paddle/pull/60869), [#60948](https://github.com/PaddlePaddle/Paddle/pull/60948), [#61042](https://github.com/PaddlePaddle/Paddle/pull/61042), [#61455](https://github.com/PaddlePaddle/Paddle/pull/61455), [#61580](https://github.com/PaddlePaddle/Paddle/pull/61580), [#61589](https://github.com/PaddlePaddle/Paddle/pull/61589), [#61609](https://github.com/PaddlePaddle/Paddle/pull/61609), [#61616](https://github.com/PaddlePaddle/Paddle/pull/61616), [#61715](https://github.com/PaddlePaddle/Paddle/pull/61715), [#61716](https://github.com/PaddlePaddle/Paddle/pull/61716), [#61759](https://github.com/PaddlePaddle/Paddle/pull/61759), [#61555](https://github.com/PaddlePaddle/Paddle/pull/61555), [#61492](https://github.com/PaddlePaddle/Paddle/pull/61492), [#61805](https://github.com/PaddlePaddle/Paddle/pull/61805), [#61712](https://github.com/PaddlePaddle/Paddle/pull/61712), [#61615](https://github.com/PaddlePaddle/Paddle/pull/61615), [#61713](https://github.com/PaddlePaddle/Paddle/pull/61713), [#62129](https://github.com/PaddlePaddle/Paddle/pull/62129), [#59294](https://github.com/PaddlePaddle/Paddle/pull/59294), [#59865](https://github.com/PaddlePaddle/Paddle/pull/59865), [#60270](https://github.com/PaddlePaddle/Paddle/pull/60270), [#60547](https://github.com/PaddlePaddle/Paddle/pull/60547), [#60698](https://github.com/PaddlePaddle/Paddle/pull/60698), [#60762](https://github.com/PaddlePaddle/Paddle/pull/60762), [#60753](https://github.com/PaddlePaddle/Paddle/pull/60753), [#60966](https://github.com/PaddlePaddle/Paddle/pull/60966), [#60976](https://github.com/PaddlePaddle/Paddle/pull/60976), [#61100](https://github.com/PaddlePaddle/Paddle/pull/61100), [#61203](https://github.com/PaddlePaddle/Paddle/pull/61203), [#61210](https://github.com/PaddlePaddle/Paddle/pull/61210), [#61424](https://github.com/PaddlePaddle/Paddle/pull/61424), [#61213](https://github.com/PaddlePaddle/Paddle/pull/61213), [#61275](https://github.com/PaddlePaddle/Paddle/pull/61275), [#61276](https://github.com/PaddlePaddle/Paddle/pull/61276), [#61279](https://github.com/PaddlePaddle/Paddle/pull/61279), [#61292](https://github.com/PaddlePaddle/Paddle/pull/61292), [#61295](https://github.com/PaddlePaddle/Paddle/pull/61295), [#61298](https://github.com/PaddlePaddle/Paddle/pull/61298), [#61299](https://github.com/PaddlePaddle/Paddle/pull/61299), [#61301](https://github.com/PaddlePaddle/Paddle/pull/61301), [#61302](https://github.com/PaddlePaddle/Paddle/pull/61302), [#61329](https://github.com/PaddlePaddle/Paddle/pull/61329), [#61804](https://github.com/PaddlePaddle/Paddle/pull/61804), [#62745](https://github.com/PaddlePaddle/Paddle/pull/62745), [#62909](https://github.com/PaddlePaddle/Paddle/pull/62909), [#64247](https://github.com/PaddlePaddle/Paddle/pull/64247), [#64308](https://github.com/PaddlePaddle/Paddle/pull/64308), [#60690](https://github.com/PaddlePaddle/Paddle/pull/60690), [#61149](https://github.com/PaddlePaddle/Paddle/pull/61149), [#61145](https://github.com/PaddlePaddle/Paddle/pull/61145), [#61193](https://github.com/PaddlePaddle/Paddle/pull/61193), [#61207](https://github.com/PaddlePaddle/Paddle/pull/61207), [#61229](https://github.com/PaddlePaddle/Paddle/pull/61229), [#61236](https://github.com/PaddlePaddle/Paddle/pull/61236), [#61244](https://github.com/PaddlePaddle/Paddle/pull/61244), [#61242](https://github.com/PaddlePaddle/Paddle/pull/61242), [#61263](https://github.com/PaddlePaddle/Paddle/pull/61263), [#61370](https://github.com/PaddlePaddle/Paddle/pull/61370), [#61410](https://github.com/PaddlePaddle/Paddle/pull/61410), [#61480](https://github.com/PaddlePaddle/Paddle/pull/61480), [#61522](https://github.com/PaddlePaddle/Paddle/pull/61522), [#61540](https://github.com/PaddlePaddle/Paddle/pull/61540), [#61520](https://github.com/PaddlePaddle/Paddle/pull/61520), [#61625](https://github.com/PaddlePaddle/Paddle/pull/61625), [#61700](https://github.com/PaddlePaddle/Paddle/pull/61700), [#61708](https://github.com/PaddlePaddle/Paddle/pull/61708), [#61736](https://github.com/PaddlePaddle/Paddle/pull/61736), [#61889](https://github.com/PaddlePaddle/Paddle/pull/61889), [#61952](https://github.com/PaddlePaddle/Paddle/pull/61952), [#62033](https://github.com/PaddlePaddle/Paddle/pull/62033), [#62637](https://github.com/PaddlePaddle/Paddle/pull/62637), [#62777](https://github.com/PaddlePaddle/Paddle/pull/62777), [#62779](https://github.com/PaddlePaddle/Paddle/pull/62779), [#63226](https://github.com/PaddlePaddle/Paddle/pull/63226), [#63287](https://github.com/PaddlePaddle/Paddle/pull/63287), [#63398](https://github.com/PaddlePaddle/Paddle/pull/63398), [#63431](https://github.com/PaddlePaddle/Paddle/pull/63431), [#64000](https://github.com/PaddlePaddle/Paddle/pull/64000), [#64058](https://github.com/PaddlePaddle/Paddle/pull/64058), [#64059](https://github.com/PaddlePaddle/Paddle/pull/64059), [#64063](https://github.com/PaddlePaddle/Paddle/pull/64063), [#64066](https://github.com/PaddlePaddle/Paddle/pull/64066), [#64089](https://github.com/PaddlePaddle/Paddle/pull/64089), [#64170](https://github.com/PaddlePaddle/Paddle/pull/64170), [#64235](https://github.com/PaddlePaddle/Paddle/pull/64235), [#64237](https://github.com/PaddlePaddle/Paddle/pull/64237), [#64243](https://github.com/PaddlePaddle/Paddle/pull/64243), [#64242](https://github.com/PaddlePaddle/Paddle/pull/64242), [#64286](https://github.com/PaddlePaddle/Paddle/pull/64286), [#64322](https://github.com/PaddlePaddle/Paddle/pull/64322), [#64317](https://github.com/PaddlePaddle/Paddle/pull/64317), [#64490](https://github.com/PaddlePaddle/Paddle/pull/64490), [#60138](https://github.com/PaddlePaddle/Paddle/pull/60138), [#62384](https://github.com/PaddlePaddle/Paddle/pull/62384), [#59702](https://github.com/PaddlePaddle/Paddle/pull/59702), [#60341](https://github.com/PaddlePaddle/Paddle/pull/60341), [#60636](https://github.com/PaddlePaddle/Paddle/pull/60636), [#60714](https://github.com/PaddlePaddle/Paddle/pull/60714), [#60716](https://github.com/PaddlePaddle/Paddle/pull/60716), [#60700](https://github.com/PaddlePaddle/Paddle/pull/60700), [#60702](https://github.com/PaddlePaddle/Paddle/pull/60702), [#60704](https://github.com/PaddlePaddle/Paddle/pull/60704), [#60715](https://github.com/PaddlePaddle/Paddle/pull/60715), [#60713](https://github.com/PaddlePaddle/Paddle/pull/60713), [#60711](https://github.com/PaddlePaddle/Paddle/pull/60711), [#60724](https://github.com/PaddlePaddle/Paddle/pull/60724), [#60803](https://github.com/PaddlePaddle/Paddle/pull/60803), [#61331](https://github.com/PaddlePaddle/Paddle/pull/61331), [#63286](https://github.com/PaddlePaddle/Paddle/pull/63286), [#60473](https://github.com/PaddlePaddle/Paddle/pull/60473), [#61046](https://github.com/PaddlePaddle/Paddle/pull/61046), [#61859](https://github.com/PaddlePaddle/Paddle/pull/61859), [#60675](https://github.com/PaddlePaddle/Paddle/pull/60675), [#60719](https://github.com/PaddlePaddle/Paddle/pull/60719), [#62863](https://github.com/PaddlePaddle/Paddle/pull/62863), [#63013](https://github.com/PaddlePaddle/Paddle/pull/63013), [#61293](https://github.com/PaddlePaddle/Paddle/pull/61293), [#62781](https://github.com/PaddlePaddle/Paddle/pull/62781), [#62935](https://github.com/PaddlePaddle/Paddle/pull/62935), [#63014](https://github.com/PaddlePaddle/Paddle/pull/63014), [#64203](https://github.com/PaddlePaddle/Paddle/pull/64203), [#63349](https://github.com/PaddlePaddle/Paddle/pull/63349), [#59572](https://github.com/PaddlePaddle/Paddle/pull/59572), [#59911](https://github.com/PaddlePaddle/Paddle/pull/59911), [#59861](https://github.com/PaddlePaddle/Paddle/pull/59861), [#60014](https://github.com/PaddlePaddle/Paddle/pull/60014), [#59913](https://github.com/PaddlePaddle/Paddle/pull/59913), [#58889](https://github.com/PaddlePaddle/Paddle/pull/58889), [#60114](https://github.com/PaddlePaddle/Paddle/pull/60114), [#59928](https://github.com/PaddlePaddle/Paddle/pull/59928), [#60180](https://github.com/PaddlePaddle/Paddle/pull/60180), [#60168](https://github.com/PaddlePaddle/Paddle/pull/60168), [#60166](https://github.com/PaddlePaddle/Paddle/pull/60166), [#60250](https://github.com/PaddlePaddle/Paddle/pull/60250), [#60247](https://github.com/PaddlePaddle/Paddle/pull/60247), [#60172](https://github.com/PaddlePaddle/Paddle/pull/60172), [#59661](https://github.com/PaddlePaddle/Paddle/pull/59661), [#58880](https://github.com/PaddlePaddle/Paddle/pull/58880), [#60291](https://github.com/PaddlePaddle/Paddle/pull/60291), [#58881](https://github.com/PaddlePaddle/Paddle/pull/58881), [#58955](https://github.com/PaddlePaddle/Paddle/pull/58955), [#58684](https://github.com/PaddlePaddle/Paddle/pull/58684), [#58708](https://github.com/PaddlePaddle/Paddle/pull/58708), [#60323](https://github.com/PaddlePaddle/Paddle/pull/60323), [#58762](https://github.com/PaddlePaddle/Paddle/pull/58762), [#60048](https://github.com/PaddlePaddle/Paddle/pull/60048), [#60345](https://github.com/PaddlePaddle/Paddle/pull/60345), [#60325](https://github.com/PaddlePaddle/Paddle/pull/60325), [#59627](https://github.com/PaddlePaddle/Paddle/pull/59627), [#60416](https://github.com/PaddlePaddle/Paddle/pull/60416), [#60434](https://github.com/PaddlePaddle/Paddle/pull/60434), [#59801](https://github.com/PaddlePaddle/Paddle/pull/59801), [#60619](https://github.com/PaddlePaddle/Paddle/pull/60619), [#60445](https://github.com/PaddlePaddle/Paddle/pull/60445), [#60666](https://github.com/PaddlePaddle/Paddle/pull/60666), [#60353](https://github.com/PaddlePaddle/Paddle/pull/60353), [#60733](https://github.com/PaddlePaddle/Paddle/pull/60733), [#60693](https://github.com/PaddlePaddle/Paddle/pull/60693), [#60350](https://github.com/PaddlePaddle/Paddle/pull/60350), [#61096](https://github.com/PaddlePaddle/Paddle/pull/61096), [#61121](https://github.com/PaddlePaddle/Paddle/pull/61121), [#61164](https://github.com/PaddlePaddle/Paddle/pull/61164), [#62054](https://github.com/PaddlePaddle/Paddle/pull/62054), [#62136](https://github.com/PaddlePaddle/Paddle/pull/62136), [#62508](https://github.com/PaddlePaddle/Paddle/pull/62508), [#62988](https://github.com/PaddlePaddle/Paddle/pull/62988), [#63472](https://github.com/PaddlePaddle/Paddle/pull/63472), [#60193](https://github.com/PaddlePaddle/Paddle/pull/60193), [#60197](https://github.com/PaddlePaddle/Paddle/pull/60197), [#60198](https://github.com/PaddlePaddle/Paddle/pull/60198), [#60346](https://github.com/PaddlePaddle/Paddle/pull/60346), [#60318](https://github.com/PaddlePaddle/Paddle/pull/60318), [#60645](https://github.com/PaddlePaddle/Paddle/pull/60645), [#60650](https://github.com/PaddlePaddle/Paddle/pull/60650), [#60660](https://github.com/PaddlePaddle/Paddle/pull/60660), [#60706](https://github.com/PaddlePaddle/Paddle/pull/60706), [#60799](https://github.com/PaddlePaddle/Paddle/pull/60799), [#60837](https://github.com/PaddlePaddle/Paddle/pull/60837), [#60817](https://github.com/PaddlePaddle/Paddle/pull/60817), [#60820](https://github.com/PaddlePaddle/Paddle/pull/60820), [#60894](https://github.com/PaddlePaddle/Paddle/pull/60894), [#61079](https://github.com/PaddlePaddle/Paddle/pull/61079), [#61087](https://github.com/PaddlePaddle/Paddle/pull/61087), [#61073](https://github.com/PaddlePaddle/Paddle/pull/61073), [#61072](https://github.com/PaddlePaddle/Paddle/pull/61072), [#61127](https://github.com/PaddlePaddle/Paddle/pull/61127), [#61097](https://github.com/PaddlePaddle/Paddle/pull/61097), [#61365](https://github.com/PaddlePaddle/Paddle/pull/61365), [#61456](https://github.com/PaddlePaddle/Paddle/pull/61456), [#61846](https://github.com/PaddlePaddle/Paddle/pull/61846), [#62217](https://github.com/PaddlePaddle/Paddle/pull/62217), [#62519](https://github.com/PaddlePaddle/Paddle/pull/62519), [#62881](https://github.com/PaddlePaddle/Paddle/pull/62881), [#62880](https://github.com/PaddlePaddle/Paddle/pull/62880), [#59723](https://github.com/PaddlePaddle/Paddle/pull/59723), [#59722](https://github.com/PaddlePaddle/Paddle/pull/59722), [#59797](https://github.com/PaddlePaddle/Paddle/pull/59797), [#59960](https://github.com/PaddlePaddle/Paddle/pull/59960), [#59761](https://github.com/PaddlePaddle/Paddle/pull/59761), [#59996](https://github.com/PaddlePaddle/Paddle/pull/59996), [#60009](https://github.com/PaddlePaddle/Paddle/pull/60009), [#58896](https://github.com/PaddlePaddle/Paddle/pull/58896), [#60051](https://github.com/PaddlePaddle/Paddle/pull/60051), [#60410](https://github.com/PaddlePaddle/Paddle/pull/60410), [#60420](https://github.com/PaddlePaddle/Paddle/pull/60420), [#60548](https://github.com/PaddlePaddle/Paddle/pull/60548), [#60575](https://github.com/PaddlePaddle/Paddle/pull/60575), [#60726](https://github.com/PaddlePaddle/Paddle/pull/60726), [#60809](https://github.com/PaddlePaddle/Paddle/pull/60809), [#61346](https://github.com/PaddlePaddle/Paddle/pull/61346), [#61222](https://github.com/PaddlePaddle/Paddle/pull/61222), [#61099](https://github.com/PaddlePaddle/Paddle/pull/61099), [#62254](https://github.com/PaddlePaddle/Paddle/pull/62254), [#62269](https://github.com/PaddlePaddle/Paddle/pull/62269), [#62362](https://github.com/PaddlePaddle/Paddle/pull/62362) -- Improve the underlying error checking mechanism of PaddlePaddle to facilitate developers' debugging. [#62571](https://github.com/PaddlePaddle/Paddle/pull/62571), [#62602](https://github.com/PaddlePaddle/Paddle/pull/62602), [#60903](https://github.com/PaddlePaddle/Paddle/pull/60903), [#64695](https://github.com/PaddlePaddle/Paddle/pull/64695), [#59907](https://github.com/PaddlePaddle/Paddle/pull/59907), [#62018](https://github.com/PaddlePaddle/Paddle/pull/62018), [#62839](https://github.com/PaddlePaddle/Paddle/pull/62839), [#60651](https://github.com/PaddlePaddle/Paddle/pull/60651), [#61488](https://github.com/PaddlePaddle/Paddle/pull/61488), [#64064](https://github.com/PaddlePaddle/Paddle/pull/64064), [#63192](https://github.com/PaddlePaddle/Paddle/pull/63192), [#63525](https://github.com/PaddlePaddle/Paddle/pull/63525)。 +### Discontinued Features -### Vulnerability Fixing +- Remove outdated test cases. [#66269](https://github.com/PaddlePaddle/Paddle/pull/66269), [#66690](https://github.com/PaddlePaddle/Paddle/pull/66690), [#67505](https://github.com/PaddlePaddle/Paddle/pull/67505), [#67464](https://github.com/PaddlePaddle/Paddle/pull/67464), [#68400](https://github.com/PaddlePaddle/Paddle/pull/68400), [#68178](https://github.com/PaddlePaddle/Paddle/pull/68178), [#68194](https://github.com/PaddlePaddle/Paddle/pull/68194) +- Clean up obsolete flags and configurations. [#69124](https://github.com/PaddlePaddle/Paddle/pull/69124), [#69176](https://github.com/PaddlePaddle/Paddle/pull/69176), [#69274](https://github.com/PaddlePaddle/Paddle/pull/69274), [#68384](https://github.com/PaddlePaddle/Paddle/pull/68384) +- Eliminate old APIs. [#66032](https://github.com/PaddlePaddle/Paddle/pull/66032), [#67303](https://github.com/PaddlePaddle/Paddle/pull/67303) +- Cleaned up PIR redundancy strategy and single test. [#66366](https://github.com/PaddlePaddle/Paddle/pull/66366), [#70534](https://github.com/PaddlePaddle/Paddle/pull/70534), [#68444](https://github.com/PaddlePaddle/Paddle/pull/68444), [#70599](https://github.com/PaddlePaddle/Paddle/pull/70599), [#68801](https://github.com/PaddlePaddle/Paddle/pull/68801), [#66303](https://github.com/PaddlePaddle/Paddle/pull/66303), [#67854](https://github.com/PaddlePaddle/Paddle/pull/67854), [#70795](https://github.com/PaddlePaddle/Paddle/pull/70795) +- Discard the related unit tests and APIs for dynamic-to-static conversion. [#66421](https://github.com/PaddlePaddle/Paddle/pull/66421), [#68251](https://github.com/PaddlePaddle/Paddle/pull/68251), [#68252](https://github.com/PaddlePaddle/Paddle/pull/68252), [#68253](https://github.com/PaddlePaddle/Paddle/pull/68253), [#68254](https://github.com/PaddlePaddle/Paddle/pull/68254), [#68409](https://github.com/PaddlePaddle/Paddle/pull/68409), [#70569](https://github.com/PaddlePaddle/Paddle/pull/70569), [#71279](https://github.com/PaddlePaddle/Paddle/pull/71279) +- Discard the related unit tests for automatic parallelism. [#67857](https://github.com/PaddlePaddle/Paddle/pull/67857), [#67862](https://github.com/PaddlePaddle/Paddle/pull/67862), [#67995](https://github.com/PaddlePaddle/Paddle/pull/67995), [#68012](https://github.com/PaddlePaddle/Paddle/pull/68012), [#68013](https://github.com/PaddlePaddle/Paddle/pull/68013), [#67798](https://github.com/PaddlePaddle/Paddle/pull/67798) -- Fix potential security vulnerabilities. [#59957](https://github.com/PaddlePaddle/Paddle/pull/59957), [#61032](https://github.com/PaddlePaddle/Paddle/pull/61032), [#61356](https://github.com/PaddlePaddle/Paddle/pull/61356), [#61573](https://github.com/PaddlePaddle/Paddle/pull/61573), [#61671](https://github.com/PaddlePaddle/Paddle/pull/61671), [#62345](https://github.com/PaddlePaddle/Paddle/pull/62345), [#60097](https://github.com/PaddlePaddle/Paddle/pull/60097), [#61161](https://github.com/PaddlePaddle/Paddle/pull/61161), [#61294](https://github.com/PaddlePaddle/Paddle/pull/61294), [#61349](https://github.com/PaddlePaddle/Paddle/pull/61349), [#61344](https://github.com/PaddlePaddle/Paddle/pull/61344), [#61162](https://github.com/PaddlePaddle/Paddle/pull/61162), [#61285](https://github.com/PaddlePaddle/Paddle/pull/61285), [#61826](https://github.com/PaddlePaddle/Paddle/pull/61826), [#59967](https://github.com/PaddlePaddle/Paddle/pull/59967), [#59976](https://github.com/PaddlePaddle/Paddle/pull/59976), [#59979](https://github.com/PaddlePaddle/Paddle/pull/59979)[#60527](https://github.com/PaddlePaddle/Paddle/pull/60527),[#60646](https://github.com/PaddlePaddle/Paddle/pull/60646),[#61827](https://github.com/PaddlePaddle/Paddle/pull/61827) +## 3. Compiler architecture -### Deprecated Features +The CINN compiler has seen comprehensive improvements in completeness and performance. In this version, we have conducted thorough optimizations across all aspects of the compiler's front-end and back-end: including the addition of an automatic Re-Compute mechanism for reverse computation graphs, front-end Pass performance optimization, symbol derivation mechanism upgrades, operator fusion strategy optimization, back-end Schedule strategy, and enhanced subscript expression simplification capabilities. At the same time, we have investigated and fixed a large number of correctness and performance issues, systematically enhancing the compiler's general optimization capabilities. When the CINN compiler is enabled for the PaddlePaddle PaddleX series models, over 60% of the models show significant performance improvements compared to dynamic graph mode. -- Clean up deprecated actuators and other logic to reduce redundant codes. [#64822](https://github.com/PaddlePaddle/Paddle/pull/64822), [#60941](https://github.com/PaddlePaddle/Paddle/pull/60941) +### New Features -## Compiler Infrastructure for Neural Networks (CINN) +1. New hardware backend support: Added support for two new backends, HIP and SYCL. ([#65146](https://github.com/PaddlePaddle/Paddle/pull/65146), [#65329](https://github.com/PaddlePaddle/Paddle/pull/65329), [#69554](https://github.com/PaddlePaddle/Paddle/pull/69554), [#71204](https://github.com/PaddlePaddle/Paddle/pull/71204), [#65438](https://github.com/PaddlePaddle/Paddle/pull/65438), [#66476](https://github.com/PaddlePaddle/Paddle/pull/66476), [#66620](https://github.com/PaddlePaddle/Paddle/pull/66620), [#67813](https://github.com/PaddlePaddle/Paddle/pull/67813)) +2. Added support for manual setting of numerical ranges, equality constraints, and other information for symbol dimensions in reasoning scenarios. ([#67628](https://github.com/PaddlePaddle/Paddle/pull/67628), [#67384](https://github.com/PaddlePaddle/Paddle/pull/67384)) -In version 3.0, the compiler architecture has been significantly upgraded. Based on Shape Dialect, build a symbolic automatic derivation and simplification system, support symbolic expression and constraint construction, and support end-to-end execution under the dynamic shape of the compiler. Meanwhile, CINN has upgraded the automatic fusion of subgraphs and Pass Pipline mechanism, merged the core modules of dynamic and static shapes, and merged the iteration paths, so that the architecture is clear and unified. In this version, the compiler has been refactored in important back-end modules such as AST Compute, Schedule strategy, and Tiling, improving the general optimization capability of the compiler, and verifies the training, inference correctness and speedup performance of the dynamic shapes on the subgraphs of PaddlePaddle Industry Suite models and typical large models Llama2-7B and Stable Diffusion models. +### Function optimization -### New Features +1. Optimize the printing of error messages to enhance the development and debugging experience. ([#67738](https://github.com/PaddlePaddle/Paddle/pull/67738), [#68769](https://github.com/PaddlePaddle/Paddle/pull/68769), [#71076](https://github.com/PaddlePaddle/Paddle/pull/71076)) +2. Support the Welford algorithm, which can simultaneously ensure the performance and accuracy of the BatchNorm-related operator Kenrel. ([#71184](https://github.com/PaddlePaddle/Paddle/pull/71184), [#71057](https://github.com/PaddlePaddle/Paddle/pull/71057)) + +### Performance optimization + +1. New backend optimization strategies such as GridReduce, Loop merging, Transpose tuning, and automatic vectorization have been added, significantly enhancing Kernel performance across various dimensional spaces and under different hardware configurations in all scenarios. ([#67236](https://github.com/PaddlePaddle/Paddle/pull/67236), [#68897](https://github.com/PaddlePaddle/Paddle/pull/68897), [#69409](https://github.com/PaddlePaddle/Paddle/pull/69409), [#65336](https://github.com/PaddlePaddle/Paddle/pull/65336), [#66419](https://github.com/PaddlePaddle/Paddle/pull/66419), [#68338](https://github.com/PaddlePaddle/Paddle/pull/68338), [#68364](https://github.com/PaddlePaddle/Paddle/pull/68364), [#71087](https://github.com/PaddlePaddle/Paddle/pull/71087), [#68019](https://github.com/PaddlePaddle/Paddle/pull/68019), [#68122](https://github.com/PaddlePaddle/Paddle/pull/68122), [#65187](https://github.com/PaddlePaddle/Paddle/pull/65187), [#66742](https://github.com/PaddlePaddle/Paddle/pull/66742), [#67083](https://github.com/PaddlePaddle/Paddle/pull/67083), [#68667](https://github.com/PaddlePaddle/Paddle/pull/68667), [#68750](https://github.com/PaddlePaddle/Paddle/pull/68750), [#69376](https://github.com/PaddlePaddle/Paddle/pull/69376), [#69350](https://github.com/PaddlePaddle/Paddle/pull/69350), [#69740](https://github.com/PaddlePaddle/Paddle/pull/69740), [#68918](https://github.com/PaddlePaddle/Paddle/pull/68918), [#70092](https://github.com/PaddlePaddle/Paddle/pull/70092), [#69607](https://github.com/PaddlePaddle/Paddle/pull/69607), [#69794](https://github.com/PaddlePaddle/Paddle/pull/69794), [#70258](https://github.com/PaddlePaddle/Paddle/pull/70258), [#70547](https://github.com/PaddlePaddle/Paddle/pull/70547), [#70581](https://github.com/PaddlePaddle/Paddle/pull/70581), [#70649](https://github.com/PaddlePaddle/Paddle/pull/70649), [#69732](https://github.com/PaddlePaddle/Paddle/pull/69732), [#70786](https://github.com/PaddlePaddle/Paddle/pull/70786), [#70942](https://github.com/PaddlePaddle/Paddle/pull/70942), [#71014](https://github.com/PaddlePaddle/Paddle/pull/71014), [#71263](https://github.com/PaddlePaddle/Paddle/pull/71263), [#71249](https://github.com/PaddlePaddle/Paddle/pull/71249), [#71340](https://github.com/PaddlePaddle/Paddle/pull/71340), [#71301](https://github.com/PaddlePaddle/Paddle/pull/71301), [#71380](https://github.com +2. Optimize operator fusion strategies, upgrading various strategies including horizontal fusion, multi-downstream fusion, Reshape alignment fusion, etc., to further enhance the fusion capabilities of operators and improve end-to-end optimization performance. ([#66034](https://github.com/PaddlePaddle/Paddle/pull/66034), [#67829](https://github.com/PaddlePaddle/Paddle/pull/67829), [#68171](https://github.com/PaddlePaddle/Paddle/pull/68171), [#69478](https://github.com/PaddlePaddle/Paddle/pull/69478), [#69691](https://github.com/PaddlePaddle/Paddle/pull/69691), [#70665](https://github.com/PaddlePaddle/Paddle/pull/70665), [#71103](https://github.com/PaddlePaddle/Paddle/pull/71103), [#70873](https://github.com/PaddlePaddle/Paddle/pull/70873)) +3. The simplification capability of backend subscript expressions has been upgraded, supporting the simplification of complex expressions with dynamic and static dimensions, significantly reducing the subscript computation overhead in the generated backend Kernel. ([#68011](https://github.com/PaddlePaddle/Paddle/pull/68011), [#68617](https://github.com/PaddlePaddle/Paddle/pull/68617), [#68624](https://github.com/PaddlePaddle/Paddle/pull/68624), [#68685](https://github.com/PaddlePaddle/Paddle/pull/68685), [#68220](https://github.com/PaddlePaddle/Paddle/pull/68220), [#68720](https://github.com/PaddlePaddle/Paddle/pull/68720), [#68753](https://github.com/PaddlePaddle/Paddle/pull/68753), [#68986](https://github.com/PaddlePaddle/Paddle/pull/68986), [#68987](https://github.com/PaddlePaddle/Paddle/pull/68987), [#69071](https://github.com/PaddlePaddle/Paddle/pull/69071), [#69164](https://github.com/PaddlePaddle/Paddle/pull/69164), [#69282](https://github.com/PaddlePaddle/Paddle/pull/69282), [#69522](https://github.com/PaddlePaddle/Paddle/pull/69522), [#69857](https://github.com/PaddlePaddle/Paddle/pull/69857), [#70208](https://github.com/PaddlePaddle/Paddle/pull/70208), [#70355](https://github.com/PaddlePaddle/Paddle/pull/70355), [#70427](https://github.com/PaddlePaddle/Paddle/pull/70208), [#70450](https://github.com/PaddlePaddle/Paddle/pull/70450), [#68737](https://github.com/PaddlePaddle/Paddle/pull/68737), [#70500](https://github.com/PaddlePaddle/Paddle/pull/70500), [#70953](https://github.com/PaddlePaddle/Paddle/pull/70953), [#70933](https://github.com/PaddlePaddle/Paddle/pull/70933), [#71026](https://github.com/PaddlePaddle/Paddle/pull/71026), [#70456](https://github.com/PaddlePaddle/Paddle/pull/70456), [#70257](https://github.com/PaddlePaddle/Paddle/pull/70257), [#70461](https://github.com/PaddlePaddle/Paddle/pull/70461), [#70142](https://github.com/PaddlePaddle/Paddle/pull/70142), [#71018](https://github.com/PaddlePaddle/Paddle/pull/71018), [#71278](https://github.com/PaddlePaddle/Paddle/pull/71278)) +4. A new automatic Re-Compute mechanism for reverse computation graphs has been added, which can effectively reduce model training memory usage and improve performance. ([#69342](https://github.com/PaddlePaddle/Paddle/pull/69342), [#70255](https://github.com/PaddlePaddle/Paddle/pull/70255), [#68241](https://github.com/PaddlePaddle/Paddle/pull/68241), [#69954](https://github.com/PaddlePaddle/Paddle/pull/69954), [#70832](https://github.com/PaddlePaddle/Paddle/pull/70832)) +5. Optimize the backend Host and Device code compilation process to reduce compilation time and improve the processing performance of branches in the Broadcast scenario. ([#65669](https://github.com/PaddlePaddle/Paddle/pull/65669), [#65916](https://github.com/PaddlePaddle/Paddle/pull/65916), [#66109](https://github.com/PaddlePaddle/Paddle/pull/66109), [#65611](https://github.com/PaddlePaddle/Paddle/pull/65611), [#65990](https://github.com/PaddlePaddle/Paddle/pull/65990), [#66088](https://github.com/PaddlePaddle/Paddle/pull/66088), [#66207](https://github.com/PaddlePaddle/Paddle/pull/66207), [#66537](https://github.com/PaddlePaddle/Paddle/pull/66537), [#66768](https://github.com/PaddlePaddle/Paddle/pull/66768), [#70685](https://github.com/PaddlePaddle/Paddle/pull/70685), [#71410](https://github.com/PaddlePaddle/Paddle/pull/71410), [#66062](https://github.com/PaddlePaddle/Paddle/pull/66062)) +6. Improved and upgraded the mechanisms for symbol derivation, simplification, and caching in dynamic dimensions, added symbol derivation interface implementations for all conventional operators (580+), and provided more constraint information for Kernel compilation.([#65343](https://github.com/PaddlePaddle/Paddle/pull/65343)、[#66582](https://github.com/PaddlePaddle/Paddle/pull/66582)、[#65500](https://github.com/PaddlePaddle/Paddle/pull/65500)、[#65591](https://github.com/PaddlePaddle/Paddle/pull/65591)、[#66637](https://github.com/PaddlePaddle/Paddle/pull/66637)、[#68208](https://github.com/PaddlePaddle/Paddle/pull/68208)、[#68056](https://github.com/PaddlePaddle/Paddle/pull/68056)、[#68015](https://github.com/PaddlePaddle/Paddle/pull/68015)、[#68096](https://github.com/PaddlePaddle/Paddle/pull/68096)、[#68236](https://github.com/PaddlePaddle/Paddle/pull/68236)、[#68973](https://github.com/PaddlePaddle/Paddle/pull/68973)、[#68967](https://github.com/PaddlePaddle/Paddle/pull/68967)、[#69133](https://github.com/PaddlePaddle/Paddle/pull/69133)、[#68550](https://github.com/PaddlePaddle/Paddle/pull/68550)、[#68882](https://github.com/PaddlePaddle/Paddle/pull/68882)、[#69005](https://github.com/PaddlePaddle/Paddle/pull/69005)、[#69911](https://github.com/PaddlePaddle/Paddle/pull/69911)、[#70376](https://github.com/PaddlePaddle/Paddle/pull/70376)、[#71153](https://github.com/PaddlePaddle/Paddle/pull/71153)、[#66644](https://github.com/PaddlePaddle/Paddle/pull/66644)、[#66650](https://github.com/PaddlePaddle/Paddle/pull/66650)、[#66642](https://github.com/PaddlePaddle/Paddle/pull/66642)、[#66729](https://github.com/PaddlePaddle/Paddle/pull/66729)、[#66838](https://github.com/PaddlePaddle/Paddle/pull/66838)、[#66762](https://github.com/PaddlePaddle/Paddle/pull/66762)、[#66580](https://github.com/PaddlePaddle/Paddle/pull/66580)、[#66612](https://github.com/PaddlePaddle/Paddle/pull/66612)、[#66625](https://github.com/PaddlePaddle/Paddle/pull/66625)、[#66643](https://github.com/PaddlePaddle/Paddle/pull/66643)、[#66837](https://github.com/PaddlePaddle/Paddle/pull/66837)、[#66946](https://github.com/PaddlePaddle/Paddle/pull/66946)、[#67018](https://github.com/PaddlePaddle/Paddle/pull/67018)、[#67049](https://github.com/PaddlePaddle/Paddle/pull/67049)、[#66956](https://github.com/PaddlePaddle/Paddle/pull/66956)、[#67008](https://github.com/PaddlePaddle/Paddle/pull/67008)、[#66930](https://github.com/PaddlePaddle/Paddle/pull/66930)、[#66877](https://github.com/PaddlePaddle/Paddle/pull/66877)、[#66896](https://github.com/PaddlePaddle/Paddle/pull/66896)、[#67120](https://github.com/PaddlePaddle/Paddle/pull/67120)、[#67117](https://github.com/PaddlePaddle/Paddle/pull/67117)、[#67098](https://github.com/PaddlePaddle/Paddle/pull/67098)、[#67136](https://github.com/PaddlePaddle/Paddle/pull/67136)、[#67294](https://github.com/PaddlePaddle/Paddle/pull/67294)、[#67327](https://github.com/PaddlePaddle/Paddle/pull/67327)、[#66827](https://github.com/PaddlePaddle/Paddle/pull/66827)、[#67201](https://github.com/PaddlePaddle/Paddle/pull/67201)、[#66892](https://github.com/PaddlePaddle/Paddle/pull/66892)、[#67377](https://github.com/PaddlePaddle/Paddle/pull/67377)、[#66619](https://github.com/PaddlePaddle/Paddle/pull/66619)、[#67037](https://github.com/PaddlePaddle/Paddle/pull/67037)、[#67412](https://github.com/PaddlePaddle/Paddle/pull/67412)、[#67394](https://github.com/PaddlePaddle/Paddle/pull/67394)、[#67374](https://github.com/PaddlePaddle/Paddle/pull/67374)、[#67418](https://github.com/PaddlePaddle/Paddle/pull/67418)、[#67348](https://github.com/PaddlePaddle/Paddle/pull/67348)、[#67337](https://github.com/PaddlePaddle/Paddle/pull/67337)、[#67390](https://github.com/PaddlePaddle/Paddle/pull/67390)、[#67407](https://github.com/PaddlePaddle/Paddle/pull/67407)、[#67491](https://github.com/PaddlePaddle/Paddle/pull/67491)、[#67422](https://github.com/PaddlePaddle/Paddle/pull/67422)、[#67461](https://github.com/PaddlePaddle/Paddle/pull/67461)、[#67458](https://github.com/PaddlePaddle/Paddle/pull/67458)、[#67486](https://github.com/PaddlePaddle/Paddle/pull/67486)、[#67490](https://github.com/PaddlePaddle/Paddle/pull/67490)、[#67462](https://github.com/PaddlePaddle/Paddle/pull/67462)、[#67364](https://github.com/PaddlePaddle/Paddle/pull/67364)、[#67435](https://github.com/PaddlePaddle/Paddle/pull/67435)、[#67665](https://github.com/PaddlePaddle/Paddle/pull/67665)、[#67426](https://github.com/PaddlePaddle/Paddle/pull/67426)、[#67507](https://github.com/PaddlePaddle/Paddle/pull/67507)、[#67730](https://github.com/PaddlePaddle/Paddle/pull/67730)、[#67776](https://github.com/PaddlePaddle/Paddle/pull/67776)、[#67806](https://github.com/PaddlePaddle/Paddle/pull/67806)、[#67803](https://github.com/PaddlePaddle/Paddle/pull/67803)、[#67788](https://github.com/PaddlePaddle/Paddle/pull/67788)、[#67705](https://github.com/PaddlePaddle/Paddle/pull/67705)、[#67814](https://github.com/PaddlePaddle/Paddle/pull/67814)、[#67858](https://github.com/PaddlePaddle/Paddle/pull/67858)、[#67751](https://github.com/PaddlePaddle/Paddle/pull/67751)、[#67875](https://github.com/PaddlePaddle/Paddle/pull/67875)、[#67663](https://github.com/PaddlePaddle/Paddle/pull/67663)、[#67434](https://github.com/PaddlePaddle/Paddle/pull/67434)、[#67818](https://github.com/PaddlePaddle/Paddle/pull/67818)、[#68180](https://github.com/PaddlePaddle/Paddle/pull/68180)、[#68547](https://github.com/PaddlePaddle/Paddle/pull/68547)、[#68548](https://github.com/PaddlePaddle/Paddle/pull/68548)、[#68670](https://github.com/PaddlePaddle/Paddle/pull/68670)、[#68964](https://github.com/PaddlePaddle/Paddle/pull/68964)、[#68929](https://github.com/PaddlePaddle/Paddle/pull/68929)、[#68907](https://github.com/PaddlePaddle/Paddle/pull/68907)、[#68917](https://github.com/PaddlePaddle/Paddle/pull/68917)、[#68984](https://github.com/PaddlePaddle/Paddle/pull/68984)、[#68644](https://github.com/PaddlePaddle/Paddle/pull/68644)、[#69167](https://github.com/PaddlePaddle/Paddle/pull/69167)、[#68975](https://github.com/PaddlePaddle/Paddle/pull/68975)、[#68947](https://github.com/PaddlePaddle/Paddle/pull/68947)、[#68978](https://github.com/PaddlePaddle/Paddle/pull/68978)、[#68980](https://github.com/PaddlePaddle/Paddle/pull/68980)、[#68979](https://github.com/PaddlePaddle/Paddle/pull/68979)、[#69329](https://github.com/PaddlePaddle/Paddle/pull/69329)、[#69055](https://github.com/PaddlePaddle/Paddle/pull/69055)、[#69331](https://github.com/PaddlePaddle/Paddle/pull/69331)、[#69414](https://github.com/PaddlePaddle/Paddle/pull/69414)、[#69335](https://github.com/PaddlePaddle/Paddle/pull/69335)、[#69017](https://github.com/PaddlePaddle/Paddle/pull/69017)、[#69344](https://github.com/PaddlePaddle/Paddle/pull/69344)、[#69069](https://github.com/PaddlePaddle/Paddle/pull/69069)、[#69698](https://github.com/PaddlePaddle/Paddle/pull/69698)、[#69919](https://github.com/PaddlePaddle/Paddle/pull/69919)、[#69964](https://github.com/PaddlePaddle/Paddle/pull/69964)、[#70337](https://github.com/PaddlePaddle/Paddle/pull/70337)、[#70282](https://github.com/PaddlePaddle/Paddle/pull/70282)、[#70741](https://github.com/PaddlePaddle/Paddle/pull/70741)、[#70818](https://github.com/PaddlePaddle/Paddle/pull/70818)、[#71031](https://github.com/PaddlePaddle/Paddle/pull/71031)、[#70541](https://github.com/PaddlePaddle/Paddle/pull/70541)、[#66609](https://github.com/PaddlePaddle/Paddle/pull/66609)、[#66889](https://github.com/PaddlePaddle/Paddle/pull/66889)、[#66633](https://github.com/PaddlePaddle/Paddle/pull/66633)、[#66735](https://github.com/PaddlePaddle/Paddle/pull/66735)、[#66935](https://github.com/PaddlePaddle/Paddle/pull/66935)、[#66627](https://github.com/PaddlePaddle/Paddle/pull/66627)、[#66730](https://github.com/PaddlePaddle/Paddle/pull/66730)、[#67210](https://github.com/PaddlePaddle/Paddle/pull/67210)、[#67115](https://github.com/PaddlePaddle/Paddle/pull/67115)、[#67275](https://github.com/PaddlePaddle/Paddle/pull/67275)、[#67472](https://github.com/PaddlePaddle/Paddle/pull/67472)、[#67577](https://github.com/PaddlePaddle/Paddle/pull/67577)、[#67328](https://github.com/PaddlePaddle/Paddle/pull/67328)、[#67566](https://github.com/PaddlePaddle/Paddle/pull/67566)、[#67451](https://github.com/PaddlePaddle/Paddle/pull/67451)、[#68098](https://github.com/PaddlePaddle/Paddle/pull/68098)、[#68225](https://github.com/PaddlePaddle/Paddle/pull/68225)、[#68177](https://github.com/PaddlePaddle/Paddle/pull/68177)、[#68102](https://github.com/PaddlePaddle/Paddle/pull/68102)、[#67951](https://github.com/PaddlePaddle/Paddle/pull/67951)、[#67957](https://github.com/PaddlePaddle/Paddle/pull/67957)、[#68235](https://github.com/PaddlePaddle/Paddle/pull/68235)、[#68447](https://github.com/PaddlePaddle/Paddle/pull/68447)、[#68446](https://github.com/PaddlePaddle/Paddle/pull/68446)、[#68183](https://github.com/PaddlePaddle/Paddle/pull/68183)、[#68318](https://github.com/PaddlePaddle/Paddle/pull/68318)、[#68385](https://github.com/PaddlePaddle/Paddle/pull/68385)、[#67635](https://github.com/PaddlePaddle/Paddle/pull/67635)、[#65623](https://github.com/PaddlePaddle/Paddle/pull/65623)、[#65956](https://github.com/PaddlePaddle/Paddle/pull/65956)、[#66063](https://github.com/PaddlePaddle/Paddle/pull/66063)、[#65992](https://github.com/PaddlePaddle/Paddle/pull/65992)、[#65880](https://github.com/PaddlePaddle/Paddle/pull/65880)、[#66343](https://github.com/PaddlePaddle/Paddle/pull/66343)、[#65889](https://github.com/PaddlePaddle/Paddle/pull/65889)、[#66606](https://github.com/PaddlePaddle/Paddle/pull/66606)、[#66618](https://github.com/PaddlePaddle/Paddle/pull/66618)、[#66737](https://github.com/PaddlePaddle/Paddle/pull/66737)、[#66607](https://github.com/PaddlePaddle/Paddle/pull/66607)、[#66579](https://github.com/PaddlePaddle/Paddle/pull/66579)、[#66732](https://github.com/PaddlePaddle/Paddle/pull/66732)、[#66849](https://github.com/PaddlePaddle/Paddle/pull/66849)、[#66400](https://github.com/PaddlePaddle/Paddle/pull/66400)、[#66952](https://github.com/PaddlePaddle/Paddle/pull/66952)、[#66570](https://github.com/PaddlePaddle/Paddle/pull/66570)、[#66967](https://github.com/PaddlePaddle/Paddle/pull/66967)、[#66595](https://github.com/PaddlePaddle/Paddle/pull/66595)、[#67121](https://github.com/PaddlePaddle/Paddle/pull/67121)、[#67206](https://github.com/PaddlePaddle/Paddle/pull/67206)、[#67444](https://github.com/PaddlePaddle/Paddle/pull/67444)、[#67494](https://github.com/PaddlePaddle/Paddle/pull/67494)、[#67499](https://github.com/PaddlePaddle/Paddle/pull/67499)、[#67267](https://github.com/PaddlePaddle/Paddle/pull/67267)、[#67567](https://github.com/PaddlePaddle/Paddle/pull/67567)、[#67455](https://github.com/PaddlePaddle/Paddle/pull/67455)、[#67161](https://github.com/PaddlePaddle/Paddle/pull/67161)、[#67581](https://github.com/PaddlePaddle/Paddle/pull/67581)、[#67539](https://github.com/PaddlePaddle/Paddle/pull/67539)、[#67625](https://github.com/PaddlePaddle/Paddle/pull/67625)、[#67690](https://github.com/PaddlePaddle/Paddle/pull/67690)、[#67454](https://github.com/PaddlePaddle/Paddle/pull/67454)、[#67731](https://github.com/PaddlePaddle/Paddle/pull/67731)、[#67734](https://github.com/PaddlePaddle/Paddle/pull/67734)、[#67735](https://github.com/PaddlePaddle/Paddle/pull/67735)、[#67607](https://github.com/PaddlePaddle/Paddle/pull/67607)、[#67413](https://github.com/PaddlePaddle/Paddle/pull/67413)、[#67387](https://github.com/PaddlePaddle/Paddle/pull/67387)、[#67882](https://github.com/PaddlePaddle/Paddle/pull/67882)、[#67864](https://github.com/PaddlePaddle/Paddle/pull/67864)、[#67503](https://github.com/PaddlePaddle/Paddle/pull/67503)、[#67861](https://github.com/PaddlePaddle/Paddle/pull/67861)、[#67888](https://github.com/PaddlePaddle/Paddle/pull/67888)、[#67884](https://github.com/PaddlePaddle/Paddle/pull/67884)、[#67826](https://github.com/PaddlePaddle/Paddle/pull/67826)、[#68044](https://github.com/PaddlePaddle/Paddle/pull/68044)、[#67851](https://github.com/PaddlePaddle/Paddle/pull/67851)、[#68276](https://github.com/PaddlePaddle/Paddle/pull/68276)、[#69888](https://github.com/PaddlePaddle/Paddle/pull/69888)、[#70093](https://github.com/PaddlePaddle/Paddle/pull/70093)、[#70436](https://github.com/PaddlePaddle/Paddle/pull/70436)、[#70914](https://github.com/PaddlePaddle/Paddle/pull/70914)、[#71222](https://github.com/PaddlePaddle/Paddle/pull/71222)) +7. Optimized some front-end passes to enhance the robustness of the front-end processing flow and improve the performance of computationally intensive subgraphs. ([#65142](https://github.com/PaddlePaddle/Paddle/pull/65142), [#67466](https://github.com/PaddlePaddle/Paddle/pull/67466), [#69228](https://github.com/PaddlePaddle/Paddle/pull/69228), [#70994](https://github.com/PaddlePaddle/Paddle/pull/70994), [#71226](https://github.com/PaddlePaddle/Paddle/pull/71226), [#71297](https://github.com/PaddlePaddle/Paddle/pull/71297), [#71443](https://github.com/PaddlePaddle/Paddle/pull/71443)) +8. Designed new backend IR basic components and related Pass interfaces to provide a more concise and efficient way of developing optimization strategies. Through automatic pruning strategies, it can effectively reduce the traversal overhead of backend IR. ([#70485](https://github.com/PaddlePaddle/Paddle/pull/70485), [#70765](https://github.com/PaddlePaddle/Paddle/pull/70765), [#71042](https://github.com/PaddlePaddle/Paddle/pull/71042), [#70952](https://github.com/PaddlePaddle/Paddle/pull/70952), [#69454](https://github.com/PaddlePaddle/Paddle/pull/69454), [#70361](https://github.com/PaddlePaddle/Paddle/pull/70361), [#70334](https://github.com/PaddlePaddle/Paddle/pull/70334), [#70406](https://github.com/PaddlePaddle/Paddle/pull/70406), [#70191](https://github.com/PaddlePaddle/Paddle/pull/70191), [#70462](https://github.com/PaddlePaddle/Paddle/pull/70462), [#70548](https://github.com/PaddlePaddle/Paddle/pull/70548), [#70592](https://github.com/PaddlePaddle/Paddle/pull/70592), [#70437](https://github.com/PaddlePaddle/Paddle/pull/70437), [#70619](https://github.com/PaddlePaddle/Paddle/pull/70619), [#70543](https://github.com/PaddlePaddle/Paddle/pull/70543), [#69611](https://github.com/PaddlePaddle/Paddle/pull/69611), [#70739](https://github.com/PaddlePaddle/Paddle/pull/70739), [#70533](https://github.com/PaddlePaddle/Paddle/pull/70533), [#70696](https://github.com/PaddlePaddle/Paddle/pull/70696), [#70498](https://github.com/PaddlePaddle/Paddle/pull/70498), [#70829](https://github.com/PaddlePaddle/Paddle/pull/70829), [#71111](https://github.com/PaddlePaddle/Paddle/pull/71111), [#70883](https://github.com/PaddlePaddle/Paddle/pull/70883)) + +### Bug fixes + +1. Fix some bugs in the derivation and implementation logic of operator symbols. ([#65185](https://github.com/PaddlePaddle/Paddle/pull/65185), [#65231](https://github.com/PaddlePaddle/Paddle/pull/65231), [#65266](https://github.com/PaddlePaddle/Paddle/pull/65266), [#65951](https://github.com/PaddlePaddle/Paddle/pull/65951), [#67142](https://github.com/PaddlePaddle/Paddle/pull/67142), [#67286](https://github.com/PaddlePaddle/Paddle/pull/67286), [#65958](https://github.com/PaddlePaddle/Paddle/pull/65958), [#65955](https://github.com/PaddlePaddle/Paddle/pull/65955), [#66470](https://github.com/PaddlePaddle/Paddle/pull/66470), [#66764](https://github.com/PaddlePaddle/Paddle/pull/66764), [#66036](https://github.com/PaddlePaddle/Paddle/pull/66036), [#66662](https://github.com/PaddlePaddle/Paddle/pull/66662), [#66741](https://github.com/PaddlePaddle/Paddle/pull/66741), [#66745](https://github.com/PaddlePaddle/Paddle/pull/66745), [#66807](https://github.com/PaddlePaddle/Paddle/pull/66807), [#66791](https://github.com/PaddlePaddle/Paddle/pull/66791), [#66859](https://github.com/PaddlePaddle/Paddle/pull/66859), [#66880](https://github.com/PaddlePaddle/Paddle/pull/66880), [#66962](https://github.com/PaddlePaddle/Paddle/pull/66962)) +2. Fixed bugs in the lowering of some special operators to the compiler. ([#68698](https://github.com/PaddlePaddle/Paddle/pull/68698), [#68699](https://github.com/PaddlePaddle/Paddle/pull/68699), [#68691](https://github.com/PaddlePaddle/Paddle/pull/68691), [#68948](https://github.com/PaddlePaddle/Paddle/pull/68948), [#70144](https://github.com/PaddlePaddle/Paddle/pull/70144), [#70895](https://github.com/PaddlePaddle/Paddle/pull/70895)) +3. Fixed the issue of errors reported in some scenarios when integrating operators. ([#67038](https://github.com/PaddlePaddle/Paddle/pull/67038), [#67400](https://github.com/PaddlePaddle/Paddle/pull/67400), [#67655](https://github.com/PaddlePaddle/Paddle/pull/67655), [#67723](https://github.com/PaddlePaddle/Paddle/pull/67723), [#68029](https://github.com/PaddlePaddle/Paddle/pull/68029), [#68042](https://github.com/PaddlePaddle/Paddle/pull/68042), [#68888](https://github.com/PaddlePaddle/Paddle/pull/68888), [#69250](https://github.com/PaddlePaddle/Paddle/pull/69250), [#69937](https://github.com/PaddlePaddle/Paddle/pull/69937), [#70924](https://github.com/PaddlePaddle/Paddle/pull/70924)) +4. Fix the correctness issue of the backend when handling extreme values, and improve the robustness of the compiler. ([#68327](https://github.com/PaddlePaddle/Paddle/pull/68327)) +5. Fixed implementation logic bugs in the backend Schedule and post-processing tuning process, resolving errors and performance issues in some cases. ([#68605](https://github.com/PaddlePaddle/Paddle/pull/68605), [#68937](https://github.com/PaddlePaddle/Paddle/pull/68937), [#68587](https://github.com/PaddlePaddle/Paddle/pull/68587), [#69060](https://github.com/PaddlePaddle/Paddle/pull/69060), [#69608](https://github.com/PaddlePaddle/Paddle/pull/69608), [#71471](https://github.com/PaddlePaddle/Paddle/pull/71471), [#71068](https://github.com/PaddlePaddle/Paddle/pull/71068)) +6. Resolved the issue of randomness in the operator fusion process. ([#69547](https://github.com/PaddlePaddle/Paddle/pull/69547), [#70931](https://github.com/PaddlePaddle/Paddle/pull/70931)) -1. Upgrade the new automatic subgraph fusion mechanism, and innovatively propose the TrivialOp and ReduceOp fusion theory, supporting a wider range of vertical fusion and horizontal fusion, ensuring the correctness and robustness of subgraph fusion, and giving full play to the fusion potential of the neural network compiler.([#63340](https://github.com/PaddlePaddle/Paddle/pull/63340)、[#63913](https://github.com/PaddlePaddle/Paddle/pull/63913)、[#63579](https://github.com/PaddlePaddle/Paddle/pull/63579)、[#63605](https://github.com/PaddlePaddle/Paddle/pull/63605)、[#60769](https://github.com/PaddlePaddle/Paddle/pull/60769)、[#62088](https://github.com/PaddlePaddle/Paddle/pull/62088)、[#63124](https://github.com/PaddlePaddle/Paddle/pull/63124)、[#63658](https://github.com/PaddlePaddle/Paddle/pull/63658)、[#64557](https://github.com/PaddlePaddle/Paddle/pull/64557)、[#63318](https://github.com/PaddlePaddle/Paddle/pull/63318)、[#62545](https://github.com/PaddlePaddle/Paddle/pull/62545)) -2. Add the symbol derivation function of dynamic shapes. Based on the Shape Dialect, realize the dynamic symbol construction, automatic derivation, constraint expression, symbol simplification and other mechanisms, introduce the DimExpr concept, upgrade the support for the PaddlePaddle framework of the InferSymbolicShape logic of the 150 + typical primitive operators, and provide more information for training and inference with compiler support for dynamic shapes.([#60843](https://github.com/PaddlePaddle/Paddle/pull/60843)、[#62662](https://github.com/PaddlePaddle/Paddle/pull/62662)、[#63790](https://github.com/PaddlePaddle/Paddle/pull/63790)、[#60098](https://github.com/PaddlePaddle/Paddle/pull/60098)、[#60511](https://github.com/PaddlePaddle/Paddle/pull/60511)、[#61232](https://github.com/PaddlePaddle/Paddle/pull/61232)、[#61939](https://github.com/PaddlePaddle/Paddle/pull/61939)、[#62798](https://github.com/PaddlePaddle/Paddle/pull/62798)、[#62955](https://github.com/PaddlePaddle/Paddle/pull/62955)、[#63029](https://github.com/PaddlePaddle/Paddle/pull/63029)、[#60572](https://github.com/PaddlePaddle/Paddle/pull/60572)、[#61035](https://github.com/PaddlePaddle/Paddle/pull/61035)、[#61224](https://github.com/PaddlePaddle/Paddle/pull/61224)、[#61587](https://github.com/PaddlePaddle/Paddle/pull/61587)、[#61937](https://github.com/PaddlePaddle/Paddle/pull/61937)、[#62314](https://github.com/PaddlePaddle/Paddle/pull/62314)、[#62394](https://github.com/PaddlePaddle/Paddle/pull/62394)、[#62569](https://github.com/PaddlePaddle/Paddle/pull/62569)、[#62495](https://github.com/PaddlePaddle/Paddle/pull/62495)、[#62844](https://github.com/PaddlePaddle/Paddle/pull/62844)、[#63000](https://github.com/PaddlePaddle/Paddle/pull/63000)、[#63016](https://github.com/PaddlePaddle/Paddle/pull/63016)、[#64222](https://github.com/PaddlePaddle/Paddle/pull/64222)、[#60129](https://github.com/PaddlePaddle/Paddle/pull/60129)、[#60899](https://github.com/PaddlePaddle/Paddle/pull/60899)、[#61342](https://github.com/PaddlePaddle/Paddle/pull/61342)、[#61439](https://github.com/PaddlePaddle/Paddle/pull/61439)、[#62766](https://github.com/PaddlePaddle/Paddle/pull/62766)、[#61133](https://github.com/PaddlePaddle/Paddle/pull/61133)、[#61430](https://github.com/PaddlePaddle/Paddle/pull/61430)、[#61498](https://github.com/PaddlePaddle/Paddle/pull/61498)、[#61680](https://github.com/PaddlePaddle/Paddle/pull/61680)、[#63367](https://github.com/PaddlePaddle/Paddle/pull/63367)、[#62151](https://github.com/PaddlePaddle/Paddle/pull/62151)、[#62665](https://github.com/PaddlePaddle/Paddle/pull/62665)、[#61407](https://github.com/PaddlePaddle/Paddle/pull/61407)、[#61502](https://github.com/PaddlePaddle/Paddle/pull/61502)、[#61655](https://github.com/PaddlePaddle/Paddle/pull/61655)、[#64115](https://github.com/PaddlePaddle/Paddle/pull/64115)、[#61791](https://github.com/PaddlePaddle/Paddle/pull/61791)、[#62141](https://github.com/PaddlePaddle/Paddle/pull/62141)、[#63422](https://github.com/PaddlePaddle/Paddle/pull/63422)、[#63577](https://github.com/PaddlePaddle/Paddle/pull/63577)、[#63978](https://github.com/PaddlePaddle/Paddle/pull/63978)、[#63576](https://github.com/PaddlePaddle/Paddle/pull/63576)、[#63947](https://github.com/PaddlePaddle/Paddle/pull/63947)、[#64332](https://github.com/PaddlePaddle/Paddle/pull/64332)、[#63990](https://github.com/PaddlePaddle/Paddle/pull/63990)) -3. Add the Pass Pipline function, including PdToCinn, CinnPreprocess, BuildGroupOp, GroupClusterOp, CinnLowering, Accuracy Check and other Pass strategies, to support the Lowering and execution of subgraphs in dynamic and static shapes, with a clear architecture.([#61611](https://github.com/PaddlePaddle/Paddle/pull/61611)、[#62612](https://github.com/PaddlePaddle/Paddle/pull/62612)、[#64354](https://github.com/PaddlePaddle/Paddle/pull/64354)、[#61848](https://github.com/PaddlePaddle/Paddle/pull/61848)、[#62316](https://github.com/PaddlePaddle/Paddle/pull/62316)、[#64152](https://github.com/PaddlePaddle/Paddle/pull/64152)、[#61619](https://github.com/PaddlePaddle/Paddle/pull/61619)、[#62318](https://github.com/PaddlePaddle/Paddle/pull/62318)、[#61977](https://github.com/PaddlePaddle/Paddle/pull/61977)、[#62211](https://github.com/PaddlePaddle/Paddle/pull/62211)、[#63972](https://github.com/PaddlePaddle/Paddle/pull/63972)、[#63686](https://github.com/PaddlePaddle/Paddle/pull/63686)、[#64505](https://github.com/PaddlePaddle/Paddle/pull/64505)) -4. Add the support for BuketLower and DyShapeSchdule functions, to realize automatic bucket compilation and optimization according to the range of dynamic shapes; and adapt and upgrade the logic of CodeGen module to support the generation of InferShape function and the distribution of conditional branching function of Host function, so as to support the acceleration of training inference under the dynamic Shape of large models.([#62730](https://github.com/PaddlePaddle/Paddle/pull/62730)、[#61115](https://github.com/PaddlePaddle/Paddle/pull/61115)、[#59941](https://github.com/PaddlePaddle/Paddle/pull/59941)、[#62207](https://github.com/PaddlePaddle/Paddle/pull/62207)、[#64318](https://github.com/PaddlePaddle/Paddle/pull/64318)、[#64345](https://github.com/PaddlePaddle/Paddle/pull/64345)、[#60519](https://github.com/PaddlePaddle/Paddle/pull/60519)、[#62584](https://github.com/PaddlePaddle/Paddle/pull/62584)、[#60828](https://github.com/PaddlePaddle/Paddle/pull/60828)、[#60533](https://github.com/PaddlePaddle/Paddle/pull/60533)、[#61436](https://github.com/PaddlePaddle/Paddle/pull/61436)、[#62071](https://github.com/PaddlePaddle/Paddle/pull/62071)、[#63971](https://github.com/PaddlePaddle/Paddle/pull/63971)、[#61656](https://github.com/PaddlePaddle/Paddle/pull/61656)、[#63083](https://github.com/PaddlePaddle/Paddle/pull/63083)、[#64405](https://github.com/PaddlePaddle/Paddle/pull/64405)、[#63047](https://github.com/PaddlePaddle/Paddle/pull/63047)、[#64655](https://github.com/PaddlePaddle/Paddle/pull/64655)、[#63095](https://github.com/PaddlePaddle/Paddle/pull/63095)、[#63829](https://github.com/PaddlePaddle/Paddle/pull/63829)、[#63572](https://github.com/PaddlePaddle/Paddle/pull/63572)) -5. Add support for compilation caching strategy, to automatically recognize, merge and reuse compilation results of the same subgraph structure, improve compilation efficiency by using multi-threading, so as to enhance the user experience.([#62952](https://github.com/PaddlePaddle/Paddle/pull/62952)、[#63269](https://github.com/PaddlePaddle/Paddle/pull/63269)、[#64718](https://github.com/PaddlePaddle/Paddle/pull/64718)、[#61367](https://github.com/PaddlePaddle/Paddle/pull/61367)、[#63305](https://github.com/PaddlePaddle/Paddle/pull/63305)、[#63750](https://github.com/PaddlePaddle/Paddle/pull/63750)、[#63871](https://github.com/PaddlePaddle/Paddle/pull/63871)、[#64893](https://github.com/PaddlePaddle/Paddle/pull/64893)) -6. Add support for GenerateShape mechanism, add corresponding AST Compute operator definitions, support automatic resolution of dynamic symbols, and automatic generation of ShapeOp in the Lowering stage.([#64167](https://github.com/PaddlePaddle/Paddle/pull/64167)、[#64636](https://github.com/PaddlePaddle/Paddle/pull/64636)、[#61993](https://github.com/PaddlePaddle/Paddle/pull/61993)、[#64843](https://github.com/PaddlePaddle/Paddle/pull/64843)、[#62587](https://github.com/PaddlePaddle/Paddle/pull/62587)) +## 4. Automatic parallel architecture -### Function Optimization +In the official 3.0 version, we have conducted in-depth verification and refinement of the automatic parallel architecture to better support the pre-training + fine-tuning process for common large model scenarios such as pure text dense models, pure text sparse models (MoE), and multi-modal understanding models. Specifically, we have added segmentation derivation rules for over 20 operators tailored for these scenarios, and support the conversion of automatic parallel training parameters into manual parallel parameters for downstream inference, making automatic parallelism fully usable and helping users reduce the development cost of large model parallel programs. Additionally, to further simplify the distributed development process for users, we have introduced a new `paddle.distributed.parallel` interface. Based on the encapsulation of distributed tensor notation syntax, it supports users in non-intrusively configuring common parallel strategies such as data parallelism, model parallelism, and pipeline parallelism outside of model networking. Furthermore, the static graph automatic parallel architecture has undergone a comprehensive upgrade based on PIR, with the underlying basic components, core modules, parallel strategies, and performance optimization strategies all implemented uniformly based on the extended PIR `DistDialect`. This has further enhanced the dynamic and static consistency of automatic parallelism, achieving performance levels on the Llama series models that are on par with or even surpass manual parallelism. -1. Optimize BuildCinnPass logic, upgrade the compiler's perception strategy for black and white list operators, and improve the robustness of Pass logic.([#62372](https://github.com/PaddlePaddle/Paddle/pull/62372)、[#61081](https://github.com/PaddlePaddle/Paddle/pull/61081)、[#61225](https://github.com/PaddlePaddle/Paddle/pull/61225)、[#58863](https://github.com/PaddlePaddle/Paddle/pull/58863)) -2. Optimize the OpLoweringGroup data structure, remove unnecessary interfaces and members, and reduce the coupling between upstream and downstream modules.([#62339](https://github.com/PaddlePaddle/Paddle/pull/62339)) -3. Optimize the component design of the compiler on the architecture Arch, to abstract the concept of hardware, and reduce the cost of adapting to domestic hardware.([#63530](https://github.com/PaddlePaddle/Paddle/pull/63530)、[#64347](https://github.com/PaddlePaddle/Paddle/pull/64347)、[#64506](https://github.com/PaddlePaddle/Paddle/pull/64506)、[#64587](https://github.com/PaddlePaddle/Paddle/pull/64587)) -4. Upgrade the AST Compute module of the compiler's back-end operator, to adapt to support the computing logic of dynamic Shape.([#62488](https://github.com/PaddlePaddle/Paddle/pull/62488)、[#63581](https://github.com/PaddlePaddle/Paddle/pull/63581)、[#63687](https://github.com/PaddlePaddle/Paddle/pull/63687)、[#63654](https://github.com/PaddlePaddle/Paddle/pull/63654)、[#64217](https://github.com/PaddlePaddle/Paddle/pull/64217)) +### New Features + +- Added the `paddle.distributed.parallel` interface to support configuring common parallel strategies outside of model networking, simplifying the distributed development process. [#69004](https://github.com/PaddlePaddle/Paddle/pull/69004), [#69033](https://github.com/PaddlePaddle/Paddle/pull/69033), [#69077](https://github.com/PaddlePaddle/Paddle/pull/69077), [#69136](https://github.com/PaddlePaddle/Paddle/pull/69136), [#69169](https://github.com/PaddlePaddle/Paddle/pull/69169), [#69212](https://github.com/PaddlePaddle/Paddle/pull/69212), [#69217](https://github.com/PaddlePaddle/Paddle/pull/69217), [#69283](https://github.com/PaddlePaddle/Paddle/pull/69283), [#69288](https://github.com/PaddlePaddle/Paddle/pull/69288), [#69326](https://github.com/PaddlePaddle/Paddle/pull/69326), [#69365](https://github.com/PaddlePaddle/Paddle/pull/69365), [#69384](https://github.com/PaddlePaddle/Paddle/pull/69384), [#69426](https://github.com/PaddlePaddle/Paddle/pull/69426), [#69443](https://github.com/PaddlePaddle/Paddle/pull/69443), [#69462](https://github.com/PaddlePaddle/Paddle/pull/69462), [#69492](https://github.com/PaddlePaddle/Paddle/pull/69492), [#69628](https://github.com/PaddlePaddle/Paddle/pull/69628), [#69677](https://github.com/PaddlePaddle/Paddle/pull/69677), [#69697](https://github.com/PaddlePaddle/Paddle/pull/69697), [#69776](https://github.com/PaddlePaddle/Paddle/pull/69776), [#69896](https://github.com/PaddlePaddle/Paddle/pull/69896), [#70138](https://github.com/PaddlePaddle/Paddle/pull/70138), [#70182](https://github.com/PaddlePaddle/Paddle/pull/70182), [#70539](https://github.com/PaddlePaddle/Paddle/pull/70539), [#71116](https://github.com/PaddlePaddle/Paddle/pull/71116), [#71210](https://github.com/PaddlePaddle/Paddle/pull/71210) +- For pure text sparse scenarios, it supports MoE expert parallelism, implements an expert parallelism to mesh partitioning conversion mechanism, and supports automatic invocation of all2all communication. [#66462](https://github.com/PaddlePaddle/Paddle/pull/66462), [#66750](https://github.com/PaddlePaddle/Paddle/pull/66750), [#68004](https://github.com/PaddlePaddle/Paddle/pull/68004), [#68053](https://github.com/PaddlePaddle/Paddle/pull/68053), [#68187](https://github.com/PaddlePaddle/Paddle/pull/68187), [#68477](https://github.com/PaddlePaddle/Paddle/pull/68477), [#69098](https://github.com/PaddlePaddle/Paddle/pull/69098), [#69262](https://github.com/PaddlePaddle/Paddle/pull/69262), [#69296](https://github.com/PaddlePaddle/Paddle/pull/69296), [#70715](https://github.com/PaddlePaddle/Paddle/pull/70715), [#71292](https://github.com/PaddlePaddle/Paddle/pull/71292), [#71320](https://github.com/PaddlePaddle/Paddle/pull/71320) +- To meet the needs of users in extreme manual optimization scenarios for managing segmentation status and communication operations, and to address the issue of being unable to use tensor segmentation syntax in some non-SPMD scenarios, we have added the `LocalLayer` interface to support a hybrid network of automatic and manual parallelism. [#70519](https://github.com/PaddlePaddle/Paddle/pull/70519), [#70525](https://github.com/PaddlePaddle/Paddle/pull/70525), [#70600](https://github.com/PaddlePaddle/Paddle/pull/70600), [#71232](https://github.com/PaddlePaddle/Paddle/pull/71232), [#71264](https://github.com/PaddlePaddle/Paddle/pull/71264), [#71373](https://github.com/PaddlePaddle/Paddle/pull/71373) +- To enable users to run automatic parallel programs using domestic hardware, we have completed the adaptation for Kunlun chips, and support for other chips is also underway. [#70997](https://github.com/PaddlePaddle/Paddle/pull/70997), [#71126](https://github.com/PaddlePaddle/Paddle/pull/71126), [#71229](https://github.com/PaddlePaddle/Paddle/pull/71229), [#71289](https://github.com/PaddlePaddle/Paddle/pull/71289), [#71425](https://github.com/PaddlePaddle/Paddle/pull/71425), [#71500](https://github.com/PaddlePaddle/Paddle/pull/71500) +- For situations where the data dimension cannot be divided evenly by the device dimension, non-balanced splitting derivation and splitting transformation are supported. [#66103](https://github.com/PaddlePaddle/Paddle/pull/66103), [#67756](https://github.com/PaddlePaddle/Paddle/pull/67756), [#69265](https://github.com/PaddlePaddle/Paddle/pull/69265), [#70072](https://github.com/PaddlePaddle/Paddle/pull/70072) +- The shard_dataloader function has been upgraded to support setting the gradient accumulation step count through `batch_sampler`, and also supports scenarios with multiple model inputs. [#65325](https://github.com/PaddlePaddle/Paddle/pull/65325), [#70659](https://github.com/PaddlePaddle/Paddle/pull/70659) +- Upgrades have been made to the parameter saving and loading functions, supporting asynchronous storage of parameters, mutual loading of `master_weight` between dynamic and static graphs, as well as parameter version control and offload functions. [#66858](https://github.com/PaddlePaddle/Paddle/pull/66858), [#67427](https://github.com/PaddlePaddle/Paddle/pull/67427), [#70105](https://github.com/PaddlePaddle/Paddle/pull/70105), [#70639](https://github.com/PaddlePaddle/Paddle/pull/70639) +- To meet users' needs for converting dynamic networking involving `PyLayer` to static, support has been added for `PyLayer` in static graph mode, allowing distributed tensors to be run within `PyLayer`. [#67326](https://github.com/PaddlePaddle/Paddle/pull/67326), [#68190](https://github.com/PaddlePaddle/Paddle/pull/68190), [#69089](https://github.com/PaddlePaddle/Paddle/pull/69089), [#70831](https://github.com/PaddlePaddle/Paddle/pull/70831) +- To address the issue of incorrect dynamic-to-static conversion caused by inconsistency between the data stream input format and the `input_spec` actually required by the model for dynamic-to-static conversion, the dynamic-to-static conversion interface supports a user-defined `input_spec` feature, allowing users to input the required `input_spec` on their own. [#69183](https://github.com/PaddlePaddle/Paddle/pull/69183) +- For hybrid parallel scenarios, the gradient clipping strategy has been adapted and supported. [#65259](https://github.com/PaddlePaddle/Paddle/pull/65259), [#65928](https://github.com/PaddlePaddle/Paddle/pull/65928), [#69287](https://github.com/PaddlePaddle/Paddle/pull/69287), [#69760](https://github.com/PaddlePaddle/Paddle/pull/69760), [#71421](https://github.com/PaddlePaddle/Paddle/pull/71421) +- For scenarios where the number of model layers is not divisible by the number of devices, a non-balanced pipeline parallel strategy is supported, allowing users to split different numbers of network layers at different pipeline stages. [#69728](https://github.com/PaddlePaddle/Paddle/pull/69728), [#70164](https://github.com/PaddlePaddle/Paddle/pull/70164), [#70230](https://github.com/PaddlePaddle/Paddle/pull/70230) +- Added `set_mesh` and `get_mesh` interfaces to enable users to easily set and retrieve the global mesh. [#69999](https://github.com/PaddlePaddle/Paddle/pull/69999) +- Added automatic and manual parallelism accuracy alignment switches to facilitate the conversion of existing manual parallelism models to automatic parallelism and verify the accuracy of the results. [#67681](https://github.com/PaddlePaddle/Paddle/pull/67681) + +### Functional improvements + +Improve and optimize the derivation rules for operator slicing + +- Added derivation rules for operators `add_n`, `split`, and `softmax_grad`. [#65606](https://github.com/PaddlePaddle/Paddle/pull/65606), [#69439](https://github.com/PaddlePaddle/Paddle/pull/69439) +- Added operator splitting derivation rules for `assign` and `embedding_grad`. [#67457](https://github.com/PaddlePaddle/Paddle/pull/67457) +- Added `clip` operator slicing derivation rule. [#70632](https://github.com/PaddlePaddle/Paddle/pull/70632) +- Added derivation rules for the `dist_stack` and `gather_nd` operators. [#65426](https://github.com/PaddlePaddle/Paddle/pull/65426) +- Added the derivation rule for `dropout` operator segmentation. [#70216](https://github.com/PaddlePaddle/Paddle/pull/70216) +- Added slicing derivation rule for `fused_dropout_add` operator. [#67722](https://github.com/PaddlePaddle/Paddle/pull/67722) +- Added `fast_ln` custom operator segmentation derivation rule. [#68148](https://github.com/PaddlePaddle/Paddle/pull/68148) +- Added `greater_equal` and `less_equal` operator slicing derivation rules. [#68868](https://github.com/PaddlePaddle/Paddle/pull/68868) +- Added `greater_than` and `less_than` operator slicing derivation rules. [#68133](https://github.com/PaddlePaddle/Paddle/pull/68133) +- Added `if` operator segmentation derivation rule. [#69357](https://github.com/PaddlePaddle/Paddle/pull/69357) +- Added slicing derivation rules for operators `logical_and`, `logical_not`, `logical_or`, and `logical_xor`. [#67840](https://github.com/PaddlePaddle/Paddle/pull/67840) +- Added `logsumexp` operator slicing derivation rule. [#67840](https://github.com/PaddlePaddle/Paddle/pull/67840) +- Added `non_zero` operator slicing derivation rule. [#67996](https://github.com/PaddlePaddle/Paddle/pull/67996) +- Added `pad` operator slicing derivation rule. [#68304](https://github.com/PaddlePaddle/Paddle/pull/68304) +- Added the derivation rule for operator segmentation of `p_norm`. [#68317](https://github.com/PaddlePaddle/Paddle/pull/68317) +- Added the derivation rule for the `scatter_nd` operator's slicing. [#67980](https://github.com/PaddlePaddle/Paddle/pull/67980) +- Added `sigmoid` operator segmentation derivation rule. [#71092](https://github.com/PaddlePaddle/Paddle/pull/71092) + +Static graph automatic parallel architecture based on PIR upgrade + +- Upgrades to Automatic Mixed Precision (AMP) training. [#65089](https://github.com/PaddlePaddle/Paddle/pull/65089), [#65892](https://github.com/PaddlePaddle/Paddle/pull/65892), [#66418](https://github.com/PaddlePaddle/Paddle/pull/66418), [#66674](https://github.com/PaddlePaddle/Paddle/pull/66674), [#68545](https://github.com/PaddlePaddle/Paddle/pull/68545) +- Upgrade of recalculation strategy. [#69681](https://github.com/PaddlePaddle/Paddle/pull/69681), [#70064](https://github.com/PaddlePaddle/Paddle/pull/70064) +- Upgrades to the parameter slicing parallel strategy. [#63542](https://github.com/PaddlePaddle/Paddle/pull/63542), [#67748](https://github.com/PaddlePaddle/Paddle/pull/67748), [#68288](https://github.com/PaddlePaddle/Paddle/pull/68288), [#68314](https://github.com/PaddlePaddle/Paddle/pull/68314), [#69059](https://github.com/PaddlePaddle/Paddle/pull/69059), [#71167](https://github.com/PaddlePaddle/Paddle/pull/71167) +- Upgrading the pipeline parallelism strategy. [#66810](https://github.com/PaddlePaddle/Paddle/pull/66810), [#67174](https://github.com/PaddlePaddle/Paddle/pull/67174), [#67522](https://github.com/PaddlePaddle/Paddle/pull/67522), [#68141](https://github.com/PaddlePaddle/Paddle/pull/68141), [#68742](https://github.com/PaddlePaddle/Paddle/pull/68742), [#68962](https://github.com/PaddlePaddle/Paddle/pull/68962), [#69052](https://github.com/PaddlePaddle/Paddle/pull/69052), [#69201](https://github.com/PaddlePaddle/Paddle/pull/69201), [#69244](https://github.com/PaddlePaddle/Paddle/pull/69244), [#69578](https://github.com/PaddlePaddle/Paddle/pull/69578), [#69584](https://github.com/PaddlePaddle/Paddle/pull/69584), [#69654](https://github.com/PaddlePaddle/Paddle/pull/69654), [#69799](https://github.com/PaddlePaddle/Paddle/pull/69799), [#69894](https://github.com/PaddlePaddle/Paddle/pull/69894), [#70360](https://github.com/PaddlePaddle/Paddle/pull/70360), [#70615](https://github.com/PaddlePaddle/Paddle/pull/70615) +- Gradient accumulation strategy upgrade. [#66641](https://github.com/PaddlePaddle/Paddle/pull/66641), [#67254](https://github.com/PaddlePaddle/Paddle/pull/67254), [#67907](https://github.com/PaddlePaddle/Paddle/pull/67907), [#68391](https://github.com/PaddlePaddle/Paddle/pull/68391), [#68460](https://github.com/PaddlePaddle/Paddle/pull/68460), [#68472](https://github.com/PaddlePaddle/Paddle/pull/68472), [#68664](https://github.com/PaddlePaddle/Paddle/pull/68664), [#68727](https://github.com/PaddlePaddle/Paddle/pull/68727), [#69171](https://github.com/PaddlePaddle/Paddle/pull/69171), [#69805](https://github.com/PaddlePaddle/Paddle/pull/69805) +- Operator fusion strategy upgrade. [#68087](https://github.com/PaddlePaddle/Paddle/pull/68087), [#68207](https://github.com/PaddlePaddle/Paddle/pull/68207), [#68383](https://github.com/PaddlePaddle/Paddle/pull/68383), [#68623](https://github.com/PaddlePaddle/Paddle/pull/68623), [#68650](https://github.com/PaddlePaddle/Paddle/pull/68650), [#68736](https://github.com/PaddlePaddle/Paddle/pull/68736), [#69103](https://github.com/PaddlePaddle/Paddle/pull/69103), [#70889](https://github.com/PaddlePaddle/Paddle/pull/70889) +- The `tensor_fusion` optimization strategy has been upgraded. [#66130](https://github.com/PaddlePaddle/Paddle/pull/66130), [#68475](https://github.com/PaddlePaddle/Paddle/pull/68475), [#69243](https://github.com/PaddlePaddle/Paddle/pull/69243), [#69560](https://github.com/PaddlePaddle/Paddle/pull/69560), [#69823](https://github.com/PaddlePaddle/Paddle/pull/69823), [#70195](https://github.com/PaddlePaddle/Paddle/pull/70195), [#70309](https://github.com/PaddlePaddle/Paddle/pull/70309), [#70363](https://github.com/PaddlePaddle/Paddle/pull/70363), [#70869](https://github.com/PaddlePaddle/Paddle/pull/70869) +- Tensor parallel optimization strategy upgrade. [#68182](https://github.com/PaddlePaddle/Paddle/pull/68182), [#68389](https://github.com/PaddlePaddle/Paddle/pull/68389) +- Upgrade of custom operator segmentation derivation mechanism. [#67614](https://github.com/PaddlePaddle/Paddle/pull/67614) +- Upgrades to the parameter saving and loading mechanism. [#66416](https://github.com/PaddlePaddle/Paddle/pull/66416), [#67045](https://github.com/PaddlePaddle/Paddle/pull/67045), [#67369](https://github.com/PaddlePaddle/Paddle/pull/67369), [#68203](https://github.com/PaddlePaddle/Paddle/pull/68203) +- Optimize computation graph compilation time. [#68796](https://github.com/PaddlePaddle/Paddle/pull/68796) + +### Bug fixes + +- Fixed bugs in the segmentation derivation mechanism and the segmentation derivation rules for several operators. [#65702](https://github.com/PaddlePaddle/Paddle/pull/65702), [#65835](https://github.com/PaddlePaddle/Paddle/pull/65835), [#66098](https://github.com/PaddlePaddle/Paddle/pull/66098), [#66955](https://github.com/PaddlePaddle/Paddle/pull/66955), [#67052](https://github.com/PaddlePaddle/Paddle/pull/67052), [#67059](https://github.com/PaddlePaddle/Paddle/pull/67059), [#67101](https://github.com/PaddlePaddle/Paddle/pull/67101), [#67283](https://github.com/PaddlePaddle/Paddle/pull/67283), [#67729](https://github.com/PaddlePaddle/Paddle/pull/67729), [#67996](https://github.com/PaddlePaddle/Paddle/pull/67996), [#68413](https://github.com/PaddlePaddle/Paddle/pull/68413), [#68455](https://github.com/PaddlePaddle/Paddle/pull/68455), [#68533](https://github.com/PaddlePaddle/Paddle/pull/68533), [#68976](https://github.com/PaddlePaddle/Paddle/pull/68976), [#68977](https://github.com/PaddlePaddle/Paddle/pull/68977), [#69027](https://github.com/PaddlePaddle/Paddle/pull/69027), [#69203](https://github.com/PaddlePaddle/Paddle/pull/69203), [#69223](https://github.com/PaddlePaddle/Paddle/pull/69223), [#69862](https://github.com/PaddlePaddle/Paddle/pull/69862), [#69991](https://github.com/PaddlePaddle/Paddle/pull/69991), [#70100](https://github.com/PaddlePaddle/Paddle/pull/70100), [#70624](https://github.com/PaddlePaddle/Paddle/pull/70624), [#71024](https://github.com/PaddlePaddle/Paddle/pull/71024), [#71152](https://github.com/PaddlePaddle/Paddle/pull/71152), [#71214](https://github.com/PaddlePaddle/Paddle/pull/71214), [#71253](https://github.com/PaddlePaddle/Paddle/pull/71253), [#71388](https://github.com/PaddlePaddle/Paddle/pull/71388) +- Fixed several bugs in the segmentation conversion mechanism. [#65060](https://github.com/PaddlePaddle/Paddle/pull/65060), [#65820](https://github.com/PaddlePaddle/Paddle/pull/65820), [#67630](https://github.com/PaddlePaddle/Paddle/pull/67630), [#67809](https://github.com/PaddlePaddle/Paddle/pull/67809), [#68115](https://github.com/PaddlePaddle/Paddle/pull/68115), [#68468](https://github.com/PaddlePaddle/Paddle/pull/68468), [#70023](https://github.com/PaddlePaddle/Paddle/pull/70023) +- Fixed the bug of incorrect derivation of `shard_degree` in parameter slice parallelism. [#68781](https://github.com/PaddlePaddle/Paddle/pull/68781), [#69214](https://github.com/PaddlePaddle/Paddle/pull/69214) +- Fixed issues in scenarios such as inconsistent results between dynamic and static graphs in `shard_dataloader`, slicing dict-type data, and custom `sampler` scenarios. [#65262](https://github.com/PaddlePaddle/Paddle/pull/65262), [#66096](https://github.com/PaddlePaddle/Paddle/pull/66096), [#66882](https://github.com/PaddlePaddle/Paddle/pull/66882), [#69620](https://github.com/PaddlePaddle/Paddle/pull/69620) +- Fixed the bug where the `recompute` setting with `use_reentrant=false` was incompatible with parameter slicing. [#65188](https://github.com/PaddlePaddle/Paddle/pull/65188) +- Fixed bugs in the parameter loading and saving functions. [#66266](https://github.com/PaddlePaddle/Paddle/pull/66266), [#69764](https://github.com/PaddlePaddle/Paddle/pull/69764) +- Fixed bugs in operators such as `Conv2D`, `fill_constant`, `flash_attn_grad`, `reduce_scatter`, `if`, `tuple_push`, and `tuple_pop`. [#67587](https://github.com/PaddlePaddle/Paddle/pull/67587), [#68008](https://github.com/PaddlePaddle/Paddle/pull/68008), [#68586](https://github.com/PaddlePaddle/Paddle/pull/68586), [#68589](https://github.com/PaddlePaddle/Paddle/pull/68589), [#69519](https://github.com/PaddlePaddle/Paddle/pull/69519), [#70207](https://github.com/PaddlePaddle/Paddle/pull/70207) +- Fixed bugs in communication operators such as `reduce_scatter`, `p_send`, and `p_recv`. [#67386](https://github.com/PaddlePaddle/Paddle/pull/67386), [#71433](https://github.com/PaddlePaddle/Paddle/pull/71433) +- Fixed bugs related to tensor type promotion. [#66541](https://github.com/PaddlePaddle/Paddle/pull/66541), [#68342](https://github.com/PaddlePaddle/Paddle/pull/68342) +- Fixed the bug where automatic allocation of GPU memory occurred when converting uninitialized distributed tensors to NumPy arrays on some cards. [#66361](https://github.com/PaddlePaddle/Paddle/pull/66361) +- Fixed the bug that triggered data copying when calling `to_tensor` on non-segmented tensors. [#67169](https://github.com/PaddlePaddle/Paddle/pull/67169) +- Fixed the bug related to the segmentation of the `scaler` parameter. [#68289](https://github.com/PaddlePaddle/Paddle/pull/68289) +- Fixed the accuracy issue of `enable_delay_scale_loss`. [#68525](https://github.com/PaddlePaddle/Paddle/pull/68525) +- Fixed the hang issue caused by different creation orders of communication groups. [#68847](https://github.com/PaddlePaddle/Paddle/pull/68847) +- Fixed the bug of incorrect `op_role` setting in static graph scenarios. [#67850](https://github.com/PaddlePaddle/Paddle/pull/67850), [#67986](https://github.com/PaddlePaddle/Paddle/pull/67986), [#68156](https://github.com/PaddlePaddle/Paddle/pull/68156) +- Fixed the bug where the output variable of the random number operator could not be sliced in static graphs. [#67589](https://github.com/PaddlePaddle/Paddle/pull/67589), [#67750](https://github.com/PaddlePaddle/Paddle/pull/67750), [#68067](https://github.com/PaddlePaddle/Paddle/pull/68067) +- Fixed the bug where the graph cache mechanism failed in static graphs. [#68488](https://github.com/PaddlePaddle/Paddle/pull/68488) +- Fixed the bug of index out-of-bounds in `paddle.distributed.to_distributed`. [#70174](https://github.com/PaddlePaddle/Paddle/pull/70174) +- Fixed a bug in the pipeline parallel visualization tool. [#71386](https://github.com/PaddlePaddle/Paddle/pull/71386) + +## 5. Operator mechanism + +Operator-related PRs, including the splitting of combined operators, the adaptation of new hardware-compatible operator kernels, sparse operator operations, and the retirement of old IR operators, have laid the foundation for PIR-compatible compilers and achieving performance advantages across multiple hardware platforms. The standardization of the operator system has optimized the code structure, reduced technical debt, and improved maintainability. -### Performance Optimization +### New Features -1. Optimize the Schedule logic of AST IR, restructure the core modules such as Vectorize, Unroll, AxisBind, and ComputeAt, and merged the iterative paths of dynamic and static shapes, so as to reduce the development and maintenance costs.([#60449](https://github.com/PaddlePaddle/Paddle/pull/60449)、[#60155](https://github.com/PaddlePaddle/Paddle/pull/60155)、[#60342](https://github.com/PaddlePaddle/Paddle/pull/60342)、[#60498](https://github.com/PaddlePaddle/Paddle/pull/60498)、[#60538](https://github.com/PaddlePaddle/Paddle/pull/60538)、[#60190](https://github.com/PaddlePaddle/Paddle/pull/60190)、[#61197](https://github.com/PaddlePaddle/Paddle/pull/61197)、[#63140](https://github.com/PaddlePaddle/Paddle/pull/63140)、[#61156](https://github.com/PaddlePaddle/Paddle/pull/61156)) -2. Optimize the Tiling strategy and temp Buffer function, support warp-level memory continuous Read and cache_read cache_write function, and improve the subgraph execution performance.([#64240](https://github.com/PaddlePaddle/Paddle/pull/64240)、[#60562](https://github.com/PaddlePaddle/Paddle/pull/60562)、[#64711](https://github.com/PaddlePaddle/Paddle/pull/64711)、[#62856](https://github.com/PaddlePaddle/Paddle/pull/62856)、[#61576](https://github.com/PaddlePaddle/Paddle/pull/61576)、[#61901](https://github.com/PaddlePaddle/Paddle/pull/61901)、[#62581](https://github.com/PaddlePaddle/Paddle/pull/62581)、[#61987](https://github.com/PaddlePaddle/Paddle/pull/61987)、[#60190](https://github.com/PaddlePaddle/Paddle/pull/60190)、[#63138](https://github.com/PaddlePaddle/Paddle/pull/63138)、[#62517](https://github.com/PaddlePaddle/Paddle/pull/62517)) -3. Support automatic search function of Schedule configuration and AOT offline saving mechanism to accelerate the performance of subgraph Kernel.([#64271](https://github.com/PaddlePaddle/Paddle/pull/64271)、[#64588](https://github.com/PaddlePaddle/Paddle/pull/64588)、[#64694](https://github.com/PaddlePaddle/Paddle/pull/64694)、[#64620](https://github.com/PaddlePaddle/Paddle/pull/64620)、[#64702](https://github.com/PaddlePaddle/Paddle/pull/64702)、[#63086](https://github.com/PaddlePaddle/Paddle/pull/63086)) -4. Support OptimizeReductionTactic optimization strategy to improve kernel performance in Reduce scenarios.([#6066](https://github.com/PaddlePaddle/Paddle/pull/60661)、[#61363](https://github.com/PaddlePaddle/Paddle/pull/61363)、[#60881](https://github.com/PaddlePaddle/Paddle/pull/60881)、[#63859](https://github.com/PaddlePaddle/Paddle/pull/63859)) -5. Enhance DCE Pass function, remove redundant If/For branch codes and improve execution efficiency.([#61682](https://github.com/PaddlePaddle/Paddle/pull/61682)) -6. Add support for FuseParallelMatmulPass Pass, integrate multiple Matmul operators to achieve acceleration.([#63623](https://github.com/PaddlePaddle/Paddle/pull/63623)) +- Support the splitting of combinatory operators. [#65148](https://github.com/PaddlePaddle/Paddle/pull/65148), [#65007](https://github.com/PaddlePaddle/Paddle/pull/65007), [#65482](https://github.com/PaddlePaddle/Paddle/pull/65482), [#65006](https://github.com/PaddlePaddle/Paddle/pull/65006), [#65692](https://github.com/PaddlePaddle/Paddle/pull/65692), [#65961](https://github.com/PaddlePaddle/Paddle/pull/65961), [#65968](https://github.com/PaddlePaddle/Paddle/pull/65968), [#65967](https://github.com/PaddlePaddle/Paddle/pull/65967), [#66510](https://github.com/PaddlePaddle/Paddle/pull/66510), [#66795](https://github.com/PaddlePaddle/Paddle/pull/66795), [#66835](https://github.com/PaddlePaddle/Paddle/pull/66835), [#67151](https://github.com/PaddlePaddle/Paddle/pull/67151), [#67342](https://github.com/PaddlePaddle/Paddle/pull/67342), [#67481](https://github.com/PaddlePaddle/Paddle/pull/67481), [#67502](https://github.com/PaddlePaddle/Paddle/pull/67502), [#67606](https://github.com/PaddlePaddle/Paddle/pull/67606), [#67757](https://github.com/PaddlePaddle/Paddle/pull/67757), [#67775](https://github.com/PaddlePaddle/Paddle/pull/67775), [#67891](https://github.com/PaddlePaddle/Paddle/pull/67891), [#67790](https://github.com/PaddlePaddle/Paddle/pull/67790), [#67965](https://github.com/PaddlePaddle/Paddle/pull/67965), [#67968](https://github.com/PaddlePaddle/Paddle/pull/67968), [#68168](https://github.com/PaddlePaddle/Paddle/pull/68168), [#68125](https://github.com/PaddlePaddle/Paddle/pull/68125), [#68228](https://github.com/PaddlePaddle/Paddle/pull/68228), [#68295](https://github.com/PaddlePaddle/Paddle/pull/68295), [#68353](https://github.com/PaddlePaddle/Paddle/pull/68353), [#68357](https://github.com/PaddlePaddle/Paddle/pull/68357), [#68827](https://github.com/PaddlePaddle/Paddle/pull/68827), [#68834](https://github.com/PaddlePaddle/Paddle/pull/68834), [#69239](https://github.com/PaddlePaddle/Paddle/pull/69239), [#68817](https://github.com/PaddlePaddle/Paddle/pull/68817), [#69108](https://github.com/PaddlePaddle/Paddle/pull/69108), [#69373](https://github.com/PaddlePaddle/Paddle/pull/69373), [#69372](https://github.com/PaddlePaddle/Paddle/pull/69372), [#68829](https://github.com/PaddlePaddle/Paddle/pull/68829), [#69684](https://github.com/PaddlePaddle/Paddle/pull/69684), [#68818](https://github.com/PaddlePaddle/Paddle/pull/68818), [#68835](https://github.com/PaddlePaddle/Paddle/pull/68835), [#69838](https://github.com/PaddlePaddle/Paddle/pull/69838), [#69998](https://github.com/PaddlePaddle/Paddle/pull/69998), [#69675](https://github.com/PaddlePaddle/Paddle/pull/69675), [#70367](https://github.com/PaddlePaddle/Paddle/pull/70367), [#70080](https://github.com/PaddlePaddle/Paddle/pull/70080), [#71352](https://github.com/PaddlePaddle/Paddle/pull/71352), [#66450](https://github.com/PaddlePaddle/Paddle/pull/66450), [#67593](https://github.com/PaddlePaddle/Paddle/pull/67593), [#67988](https://github.com/PaddlePaddle/Paddle/pull/67988), [#68346](https://github.com/PaddlePaddle/Paddle/pull/68346), [#68399](https://github.com/PaddlePaddle/Paddle/pull/68399), [#68319](https://github.com/PaddlePaddle/Paddle/pull/68319), [#68485](https://github.com/PaddlePaddle/Paddle/pull/68485), [#68961](https://github.com/PaddlePaddle/Paddle/pull/68961), [#68575](https://github.com/PaddlePaddle/Paddle/pull/68575) +- PIR supports Pylayer. [#69674](https://github.com/PaddlePaddle/Paddle/pull/69674), [#70375](https://github.com/PaddlePaddle/Paddle/pull/70375) +- Support for XPU-related operator computations. [#65684](https://github.com/PaddlePaddle/Paddle/pull/65684), [#65976](https://github.com/PaddlePaddle/Paddle/pull/65976), [#68497](https://github.com/PaddlePaddle/Paddle/pull/68497) +- PIR supports sparse operators. [#62663](https://github.com/PaddlePaddle/Paddle/pull/62663), [#67885](https://github.com/PaddlePaddle/Paddle/pull/67885), [#67976](https://github.com/PaddlePaddle/Paddle/pull/67976), [#68261](https://github.com/PaddlePaddle/Paddle/pull/68261), [#68326](https://github.com/PaddlePaddle/Paddle/pull/68326) +- Support manual Recompute. [#65879](https://github.com/PaddlePaddle/Paddle/pull/65879) +- Implement the kernel and register the operator. [#63130](https://github.com/PaddlePaddle/Paddle/pull/63130) +- Support for Custom Op. [#68824](https://github.com/PaddlePaddle/Paddle/pull/68824), [#68748](https://github.com/PaddlePaddle/Paddle/pull/68748) +- Added dynamic graph second-order inverse composition for acos. [#70409](https://github.com/PaddlePaddle/Paddle/pull/70409) +- Support initialization and computation of 0-size tensors. [#70504](https://github.com/PaddlePaddle/Paddle/pull/70504) -### Bug Fixing +### Bug Fixes -1. Fix the bug when Lowering some special operators to the compiler, to improve the end-to-end user experience.([#60800](https://github.com/PaddlePaddle/Paddle/pull/60800)、[#64720](https://github.com/PaddlePaddle/Paddle/pull/64720)、[#62593](https://github.com/PaddlePaddle/Paddle/pull/62593)、[#62661](https://github.com/PaddlePaddle/Paddle/pull/62661)、[#64626](https://github.com/PaddlePaddle/Paddle/pull/64626)、[#63320](https://github.com/PaddlePaddle/Paddle/pull/63320)、[#64581](https://github.com/PaddlePaddle/Paddle/pull/64581)、[#61608](https://github.com/PaddlePaddle/Paddle/pull/61608)、[#64135](https://github.com/PaddlePaddle/Paddle/pull/64135)、[#64659](https://github.com/PaddlePaddle/Paddle/pull/64659)、[#62391](https://github.com/PaddlePaddle/Paddle/pull/62391)、[#62490](https://github.com/PaddlePaddle/Paddle/pull/62490)、[#63891](https://github.com/PaddlePaddle/Paddle/pull/63891)、[#64529](https://github.com/PaddlePaddle/Paddle/pull/64529)) -2. Fix a bug in the symbolic derivation logic of some operators.([#62141](https://github.com/PaddlePaddle/Paddle/pull/62141)、[#62376](https://github.com/PaddlePaddle/Paddle/pull/62376)、[#62941](https://github.com/PaddlePaddle/Paddle/pull/62941)、[#63322](https://github.com/PaddlePaddle/Paddle/pull/63322)、[#64672](https://github.com/PaddlePaddle/Paddle/pull/64672)、[#64407](https://github.com/PaddlePaddle/Paddle/pull/64407)、[#60241](https://github.com/PaddlePaddle/Paddle/pull/60241)、[#60440](https://github.com/PaddlePaddle/Paddle/pull/60440)、[#62503](https://github.com/PaddlePaddle/Paddle/pull/62503)、[#62997](https://github.com/PaddlePaddle/Paddle/pull/62997)、[#63169](https://github.com/PaddlePaddle/Paddle/pull/63169)、[#61098](https://github.com/PaddlePaddle/Paddle/pull/61098)、[#63973](https://github.com/PaddlePaddle/Paddle/pull/63973)、[#62248](https://github.com/PaddlePaddle/Paddle/pull/62248)、[#62321](https://github.com/PaddlePaddle/Paddle/pull/62321)、[#63755](https://github.com/PaddlePaddle/Paddle/pull/63755)、[#63917](https://github.com/PaddlePaddle/Paddle/pull/63917)、[#63903](https://github.com/PaddlePaddle/Paddle/pull/63903)、[#64173](https://github.com/PaddlePaddle/Paddle/pull/64173)、[#64525](https://github.com/PaddlePaddle/Paddle/pull/64525)、[#64615](https://github.com/PaddlePaddle/Paddle/pull/64615)、[#62247](https://github.com/PaddlePaddle/Paddle/pull/62247)、[#62455](https://github.com/PaddlePaddle/Paddle/pull/62455)、[#62898](https://github.com/PaddlePaddle/Paddle/pull/62898)、[#62867](https://github.com/PaddlePaddle/Paddle/pull/62867)、[#63608](https://github.com/PaddlePaddle/Paddle/pull/63608)、[#63789](https://github.com/PaddlePaddle/Paddle/pull/63789)、[#64085](https://github.com/PaddlePaddle/Paddle/pull/64085)、[#64136](https://github.com/PaddlePaddle/Paddle/pull/64136)、[#64181](https://github.com/PaddlePaddle/Paddle/pull/64181)) -3. Fix the problems of compiler execution errors under dynamic and static shapes, to improve the robustness of the framework mechanism.([#60813](https://github.com/PaddlePaddle/Paddle/pull/60813)、[#61877](https://github.com/PaddlePaddle/Paddle/pull/61877)、[#61909](https://github.com/PaddlePaddle/Paddle/pull/61909)、[#62954](https://github.com/PaddlePaddle/Paddle/pull/62954)、[#63614](https://github.com/PaddlePaddle/Paddle/pull/63614)、[#60339](https://github.com/PaddlePaddle/Paddle/pull/60339)、[#60623](https://github.com/PaddlePaddle/Paddle/pull/60623)、[#60658](https://github.com/PaddlePaddle/Paddle/pull/60658)、[#60669](https://github.com/PaddlePaddle/Paddle/pull/60669)、[#58823](https://github.com/PaddlePaddle/Paddle/pull/58823)、[#62483](https://github.com/PaddlePaddle/Paddle/pull/62483)、[#62742](https://github.com/PaddlePaddle/Paddle/pull/62742)、[#61797](https://github.com/PaddlePaddle/Paddle/pull/61797)、[#63411](https://github.com/PaddlePaddle/Paddle/pull/63411)、[#64077](https://github.com/PaddlePaddle/Paddle/pull/64077)、[#62736](https://github.com/PaddlePaddle/Paddle/pull/62736)、[#62390](https://github.com/PaddlePaddle/Paddle/pull/62390)、[#63689](https://github.com/PaddlePaddle/Paddle/pull/63689)) +- Fixed bugs related to composite operators. [#70250](https://github.com/PaddlePaddle/Paddle/pull/70250), [#67170](https://github.com/PaddlePaddle/Paddle/pull/67170), [#71218](https://github.com/PaddlePaddle/Paddle/pull/71218), [#69095](https://github.com/PaddlePaddle/Paddle/pull/69095), [#70189](https://github.com/PaddlePaddle/Paddle/pull/70189) +- Fixed XPU-related bugs. [#65149](https://github.com/PaddlePaddle/Paddle/pull/65149), [#70845](https://github.com/PaddlePaddle/Paddle/pull/70845) +- Fixed shape-related bugs. [#68722](https://github.com/PaddlePaddle/Paddle/pull/68722), [#70210](https://github.com/PaddlePaddle/Paddle/pull/70210), [#70492](https://github.com/PaddlePaddle/Paddle/pull/70492) +- Fixed save/load-related bugs. [#69153](https://github.com/PaddlePaddle/Paddle/pull/69153) +- Fixed bugs related to types. [#65721](https://github.com/PaddlePaddle/Paddle/pull/65721), [#65859](https://github.com/PaddlePaddle/Paddle/pull/65859) +- Fixing issues during the invocation and execution of other operators, including type matching, type inference, parameter type support, etc,. [#65360](https://github.com/PaddlePaddle/Paddle/pull/65360), [#65024](https://github.com/PaddlePaddle/Paddle/pull/65024), [#66308](https://github.com/PaddlePaddle/Paddle/pull/66308), [#67085](https://github.com/PaddlePaddle/Paddle/pull/67085), [#67285](https://github.com/PaddlePaddle/Paddle/pull/67285), [#67076](https://github.com/PaddlePaddle/Paddle/pull/67076), [#67547](https://github.com/PaddlePaddle/Paddle/pull/67547), [#68007](https://github.com/PaddlePaddle/Paddle/pull/68007), [#68527](https://github.com/PaddlePaddle/Paddle/pull/68527), [#68549](https://github.com/PaddlePaddle/Paddle/pull/68549), [#68543](https://github.com/PaddlePaddle/Paddle/pull/68543), [#68604](https://github.com/PaddlePaddle/Paddle/pull/68604), [#68741](https://github.com/PaddlePaddle/Paddle/pull/68741), [#68859](https://github.com/PaddlePaddle/Paddle/pull/68859), [#69025](https://github.com/PaddlePaddle/Paddle/pull/69025), [#69065](https://github.com/PaddlePaddle/Paddle/pull/69065), [#69405](https://github.com/PaddlePaddle/Paddle/pull/69405), [#69688](https://github.com/PaddlePaddle/Paddle/pull/69688), [#69912](https://github.com/PaddlePaddle/Paddle/pull/69912), [#70177](https://github.com/PaddlePaddle/Paddle/pull/70177), [#70517](https://github.com/PaddlePaddle/Paddle/pull/70517), [#70596](https://github.com/PaddlePaddle/Paddle/pull/70596), [#70788](https://github.com/PaddlePaddle/Paddle/pull/70788), [#70870](https://github.com/PaddlePaddle/Paddle/pull/70870), [#71332](https://github.com/PaddlePaddle/Paddle/pull/71332), [#71454](https://github.com/PaddlePaddle/Paddle/pull/71454), [#71442](https://github.com/PaddlePaddle/Paddle/pull/71442), [#71499](https://github.com/PaddlePaddle/Paddle/pull/71499), [#67459](https://github.com/PaddlePaddle/Paddle/pull/67459), [#68470](https://github.com/PaddlePaddle/Paddle/pull/68470), [#70206](https://github.com/PaddlePaddle/Paddle/pull/70206) -### Deprecated Features +### Others -1. Remove useless symbol-related components such as adt DimExpr, SymbolicDimExpr and ShapedTypeInterface.([#60901](https://github.com/PaddlePaddle/Paddle/pull/60901)、[#60933](https://github.com/PaddlePaddle/Paddle/pull/60933)、[#60744](https://github.com/PaddlePaddle/Paddle/pull/60744)、[#64176](https://github.com/PaddlePaddle/Paddle/pull/64176)、[#64140](https://github.com/PaddlePaddle/Paddle/pull/64140)) -2. Remove the old Group Cluster, and the front-end representation under the old IR, to improve the simplicity of the architecture.([#63683](https://github.com/PaddlePaddle/Paddle/pull/63683)、[#64630](https://github.com/PaddlePaddle/Paddle/pull/64630)、[#61380](https://github.com/PaddlePaddle/Paddle/pull/61380)) +- Optimize code style. [#68536](https://github.com/PaddlePaddle/Paddle/pull/68536) +- Fix spelling errors. [#67456](https://github.com/PaddlePaddle/Paddle/pull/67456), [#66673](https://github.com/PaddlePaddle/Paddle/pull/66673), [#68702](https://github.com/PaddlePaddle/Paddle/pull/68702), [#68735](https://github.com/PaddlePaddle/Paddle/pull/68735), [#68718](https://github.com/PaddlePaddle/Paddle/pull/68718), [#70700](https://github.com/PaddlePaddle/Paddle/pull/70700), [#70682](https://github.com/PaddlePaddle/Paddle/pull/70682), [#70670](https://github.com/PaddlePaddle/Paddle/pull/70670), [#70241](https://github.com/PaddlePaddle/Paddle/pull/70241), [#69626](https://github.com/PaddlePaddle/Paddle/pull/69626), [#70051](https://github.com/PaddlePaddle/Paddle/pull/70051), [#67764](https://github.com/PaddlePaddle/Paddle/pull/67764), [#68872](https://github.com/PaddlePaddle/Paddle/pull/68872), [#70055](https://github.com/PaddlePaddle/Paddle/pull/70055), [#67954](https://github.com/PaddlePaddle/Paddle/pull/67954), [#67404](https://github.com/PaddlePaddle/Paddle/pull/67404), [#69273](https://github.com/PaddlePaddle/Paddle/pull/69273), [#66981](https://github.com/PaddlePaddle/Paddle/pull/66981), [#68145](https://github.com/PaddlePaddle/Paddle/pull/68145), [#69148](https://github.com/PaddlePaddle/Paddle/pull/69148), [#69145](https://github.com/PaddlePaddle/Paddle/pull/69145), [#69168](https://github.com/PaddlePaddle/Paddle/pull/69168), [#68940](https://github.com/PaddlePaddle/Paddle/pull/68940), [#70344](https://github.com/PaddlePaddle/Paddle/pull/70344) +- Modify the interface documentation. [#69378](https://github.com/PaddlePaddle/Paddle/pull/69378) +- Replaced operator and parameter naming under the fluid operator system. [#69345](https://github.com/PaddlePaddle/Paddle/pull/69345), [#69382](https://github.com/PaddlePaddle/Paddle/pull/69382), [#69484](https://github.com/PaddlePaddle/Paddle/pull/69484), [#69444](https://github.com/PaddlePaddle/Paddle/pull/69444) -## Auto-Parallel Architecture +### Discarded -In order to further enhance the usability of the Auto Parallel architecture in large model training scenarios, PaddlePaddle has improved the Auto Parallel functionality in dynamic-static graphs, including the newly added parallel strategies such as sharding parallelism and interleaved pipeline parallelism, including support of lazy initialization parameters. Add and enhance the SPMD derivation rules for some of the operators. The auto-parallel architecture has been comprehensively verified in a number of mainstream large language models. Meanwhile, in order to build the new 3.0 architecture of PaddlePaddle, the static graph auto parallel architecture has been comprehensively upgraded based on PIR, the new generation intermediate representation of Paddlepaddle. It introduces DistDialect for distributed related components, and natively support DistAttr and DistTensor in the computation graph representation, and smooth the transfom from static to dynmaic graph, further enhance the unity of auto parallel usage in dynamic and static graph mode. Finally, a number of performance optimization technologies have been added and improved, including zero bubble pipeline scheduling strategy, achieving the same or even better end-to-end training performance compared to the manual parallelism on typical large models such as Llama-2 13B/70B. +- xshape output exit. [#66769](https://github.com/PaddlePaddle/Paddle/pull/66769), [#67009](https://github.com/PaddlePaddle/Paddle/pull/67009), [#67152](https://github.com/PaddlePaddle/Paddle/pull/67152), [#67172](https://github.com/PaddlePaddle/Paddle/pull/67172), [#67355](https://github.com/PaddlePaddle/Paddle/pull/67355), [#67373](https://github.com/PaddlePaddle/Paddle/pull/67373), [#66089](https://github.com/PaddlePaddle/Paddle/pull/66089) +- Remove the obsolete operators, their kernels, related unit tests, and related calling codes under the fluid system. [#67370](https://github.com/PaddlePaddle/Paddle/pull/67370), [#67088](https://github.com/PaddlePaddle/Paddle/pull/67088), [#67324](https://github.com/PaddlePaddle/Paddle/pull/67324), [#67666](https://github.com/PaddlePaddle/Paddle/pull/67666), [#68058](https://github.com/PaddlePaddle/Paddle/pull/68058), [#68311](https://github.com/PaddlePaddle/Paddle/pull/68311), [#68358](https://github.com/PaddlePaddle/Paddle/pull/68358), [#68312](https://github.com/PaddlePaddle/Paddle/pull/68312), [#68355](https://github.com/PaddlePaddle/Paddle/pull/68355), [#67528](https://github.com/PaddlePaddle/Paddle/pull/67528), [#68316](https://github.com/PaddlePaddle/Paddle/pull/68316), [#68356](https://github.com/PaddlePaddle/Paddle/pull/68356), [#68397](https://github.com/PaddlePaddle/Paddle/pull/68397), [#68441](https://github.com/PaddlePaddle/Paddle/pull/68441), [#68417](https://github.com/PaddlePaddle/Paddle/pull/68417), [#68567](https://github.com/PaddlePaddle/Paddle/pull/68567), [#68583](https://github.com/PaddlePaddle/Paddle/pull/68583), [#68649](https://github.com/PaddlePaddle/Paddle/pull/68649), [#68331](https://github.com/PaddlePaddle/Paddle/pull/68331), [#68730](https://github.com/PaddlePaddle/Paddle/pull/68730), [#69754](https://github.com/PaddlePaddle/Paddle/pull/69754), [#69445](https://github.com/PaddlePaddle/Paddle/pull/69445), [#69921](https://github.com/PaddlePaddle/Paddle/pull/69921), [#70268](https://github.com/PaddlePaddle/Paddle/pull/70268), [#69446](https://github.com/PaddlePaddle/Paddle/pull/69446), [#69544](https://github.com/PaddlePaddle/Paddle/pull/69544), [#70272](https://github.com/PaddlePaddle/Paddle/pull/70272), [#69745](https://github.com/PaddlePaddle/Paddle/pull/69745), [#70300](https://github.com/PaddlePaddle/Paddle/pull/70300), [#70388](https://github.com/PaddlePaddle/Paddle/pull/70388), [#70421](https://github.com/PaddlePaddle/Paddle/pull/70421), [#70302](https://github.com/PaddlePaddle/Paddle/pull/70302), [#70445](https://github.com/PaddlePaddle/Paddle/pull/70445), [#69275](https://github.com/PaddlePaddle/Paddle/pull/69275), [#69081](https://github.com/PaddlePaddle/Paddle/pull/69081), [#70588](https://github.com/PaddlePaddle/Paddle/pull/70588), [#67778](https://github.com/PaddlePaddle/Paddle/pull/67778), [#67953](https://github.com/PaddlePaddle/Paddle/pull/67953), [#68093](https://github.com/PaddlePaddle/Paddle/pull/68093), [#68092](https://github.com/PaddlePaddle/Paddle/pull/68092), [#67684](https://github.com/PaddlePaddle/Paddle/pull/67684), [#69665](https://github.com/PaddlePaddle/Paddle/pull/69665), [#67915](https://github.com/PaddlePaddle/Paddle/pull/67915), [#67917](https://github.com/PaddlePaddle/Paddle/pull/67917), [#68403](https://github.com/PaddlePaddle/Paddle/pull/68403), [#68404](https://github.com/PaddlePaddle/Paddle/pull/68404), [#68969](https://github.com/PaddlePaddle/Paddle/pull/68969), [#68953](https://github.com/PaddlePaddle/Paddle/pull/68953), [#68954](https://github.com/PaddlePaddle/Paddle/pull/68954), [#68942](https://github.com/PaddlePaddle/Paddle/pull/68942), [#68950](https://github.com/PaddlePaddle/Paddle/pull/68950), [#69381](https://github.com/PaddlePaddle/Paddle/pull/69381), [#69380](https://github.com/PaddlePaddle/Paddle/pull/69380), [#69448](https://github.com/PaddlePaddle/Paddle/pull/69448), [#69680](https://github.com/PaddlePaddle/Paddle/pull/69680), [#69775](https://github.com/PaddlePaddle/Paddle/pull/69775), [#69812](https://github.com/PaddlePaddle/Paddle/pull/69812), [#69840](https://github.com/PaddlePaddle/Paddle/pull/69840), [#69828](https://github.com/PaddlePaddle/Paddle/pull/69828), [#69742](https://github.com/PaddlePaddle/Paddle/pull/69742), [#69923](https://github.com/PaddlePaddle/Paddle/pull/69923), [#69922](https://github.com/PaddlePaddle/Paddle/pull/69922), [#69904](https://github.com/PaddlePaddle/Paddle/pull/69904), [#70002](https://github.com/PaddlePaddle/Paddle/pull/70002), [#70054](https://github.com/PaddlePaddle/Paddle/pull/70054), [#70052](https://github.com/PaddlePaddle/Paddle/pull/70052), [#70053](https://github.com/PaddlePaddle/Paddle/pull/70053), [#70713](https://github.com/PaddlePaddle/Paddle/pull/70713), [#70718](https://github.com/PaddlePaddle/Paddle/pull/70718), [#70718](https://github.com/PaddlePaddle/Paddle/pull/70718), [#70717](https://github.com/PaddlePaddle/Paddle/pull/70717) +- Remove deprecated flags. [#70727](https://github.com/PaddlePaddle/Paddle/pull/70727), [#70726](https://github.com/PaddlePaddle/Paddle/pull/70726) +- Remove the deprecated API of combination operators. [#69873](https://github.com/PaddlePaddle/Paddle/pull/69873), [#69309](https://github.com/PaddlePaddle/Paddle/pull/69309) -### Function Improvements +### Developer-related -- Add the dtensor_from_local interface for creating DistTensor from local tensor after sharding (correspondingly, shard_tensor is the created DistTensor from global tensor before sharding). [#60206](https://github.com/PaddlePaddle/Paddle/pull/60206) -- Add the unshard_tensor interface to convert DistTensor to global tensor, which is reciprocal operation to shard_tensor. [#60272](https://github.com/PaddlePaddle/Paddle/pull/60272) -- To reduce the GPU memory usage during training, add Sharding parallelism, and support stage1, stage2 and stage3 modes. [#61926](https://github.com/PaddlePaddle/Paddle/pull/61926), [#62711](https://github.com/PaddlePaddle/Paddle/pull/62711), [#62486](https://github.com/PaddlePaddle/Paddle/pull/62486), [#62230](https://github.com/PaddlePaddle/Paddle/pull/62230) -- To solve the problem of insufficient GPU memory when initializing parameters first and then sharding them, add the LazyInit function, to support slicing parameters first and then initializing them. [#60316](https://github.com/PaddlePaddle/Paddle/pull/60316), [#60441](https://github.com/PaddlePaddle/Paddle/pull/60441), [#60563](https://github.com/PaddlePaddle/Paddle/pull/60563), [#61792](https://github.com/PaddlePaddle/Paddle/pull/61792) -- In order to reduce the bubble of pipeline parallel, add the interleaved pipeline parallel parallelism has been added, and support automatically converting the pipeline parallel of the user's networking to interleaved pipeline parallel through configuration, so that the user doesn't need to perform complicated marking in the networking. [#59751](https://github.com/PaddlePaddle/Paddle/pull/59751), [#60050](https://github.com/PaddlePaddle/Paddle/pull/60050), [#60467](https://github.com/PaddlePaddle/Paddle/pull/60467), [#60868](https://github.com/PaddlePaddle/Paddle/pull/60868), [#60187](https://github.com/PaddlePaddle/Paddle/pull/60187), [#62884](https://github.com/PaddlePaddle/Paddle/pull/62884), [#60560](https://github.com/PaddlePaddle/Paddle/pull/60560), [#61541](https://github.com/PaddlePaddle/Paddle/pull/61541) -- Add the SPMD derivation rules for stack, gather, scatter_grad, cumsum, unbind, swiglu, and fused_linear_param_grad. Improve and optimize the implementation of fused_rope, reshape, flatten, fused_rms_norm, slice, tile, flash_attn, cross_entropy and other operator slice derivation rules, to solve the problem of incompatibility in some of the model networking scenarios. [#62720](https://github.com/PaddlePaddle/Paddle/pull/62720), [#64202](https://github.com/PaddlePaddle/Paddle/pull/64202), [#63361](https://github.com/PaddlePaddle/Paddle/pull/63361), [#63290](https://github.com/PaddlePaddle/Paddle/pull/63290), [#61460](https://github.com/PaddlePaddle/Paddle/pull/61460), [#59986](https://github.com/PaddlePaddle/Paddle/pull/59986), [#61184](https://github.com/PaddlePaddle/Paddle/pull/61184), [#60144](https://github.com/PaddlePaddle/Paddle/pull/60144), [#62525](https://github.com/PaddlePaddle/Paddle/pull/62525), [#62053](https://github.com/PaddlePaddle/Paddle/pull/62053), [#60709](https://github.com/PaddlePaddle/Paddle/pull/60709), [#60111](https://github.com/PaddlePaddle/Paddle/pull/60111), [#63681](https://github.com/PaddlePaddle/Paddle/pull/63681), [#62180](https://github.com/PaddlePaddle/Paddle/pull/62180), [#60794](https://github.com/PaddlePaddle/Paddle/pull/60794), [#60632](https://github.com/PaddlePaddle/Paddle/pull/60632), [#62439](https://github.com/PaddlePaddle/Paddle/pull/62439) -- Improve the distributed checkpoint storage and loading function, support master_weights strategy, and fix the random hanging problem. [#60027](https://github.com/PaddlePaddle/Paddle/pull/60027), [#59872](https://github.com/PaddlePaddle/Paddle/pull/59872) -- In order to support the auto parallel of arbitrary shape tensor, add the non-uniform tensor sharding feature. [#62611](https://github.com/PaddlePaddle/Paddle/pull/62611), [#61432](https://github.com/PaddlePaddle/Paddle/pull/61432) -- In order to support users to use customized operators in the auto parallel networking, support user registration outside the framework to customize the SPMD derivation rules for this class of operators. [#60509](https://github.com/PaddlePaddle/Paddle/pull/60509) -- Improve the slice SPMD rule, and support the transition from any state to replicate and from replicate state to any state. [#60281](https://github.com/PaddlePaddle/Paddle/pull/60281), [#59869](https://github.com/PaddlePaddle/Paddle/pull/59869) -- Add MoE expert parallelism (experimental). Currently, only dynamic graph auto parallel is supported. [#63904](https://github.com/PaddlePaddle/Paddle/pull/63904) -- Fix some process adaptation problems of auto parallel and dynamic diagram execution, and dynamic to static. [#60214](https://github.com/PaddlePaddle/Paddle/pull/60214), [#60546](https://github.com/PaddlePaddle/Paddle/pull/60546), [#62082](https://github.com/PaddlePaddle/Paddle/pull/62082), [#61313](https://github.com/PaddlePaddle/Paddle/pull/61313), [#61840](https://github.com/PaddlePaddle/Paddle/pull/61840), [#60614](https://github.com/PaddlePaddle/Paddle/pull/60614), [#60234](https://github.com/PaddlePaddle/Paddle/pull/60234), [#64813](https://github.com/PaddlePaddle/Paddle/pull/64813), [#61606](https://github.com/PaddlePaddle/Paddle/pull/61606), [#63405](https://github.com/PaddlePaddle/Paddle/pull/63405), [#64334](https://github.com/PaddlePaddle/Paddle/pull/64334), [#60504](https://github.com/PaddlePaddle/Paddle/pull/60504) +- Support for composition operators, including adapter operators, adding flags, test cases, etc. [#67725](https://github.com/PaddlePaddle/Paddle/pull/67725), [#65252](https://github.com/PaddlePaddle/Paddle/pull/65252), [#67590](https://github.com/PaddlePaddle/Paddle/pull/67590), [#68076](https://github.com/PaddlePaddle/Paddle/pull/68076), [#66711](https://github.com/PaddlePaddle/Paddle/pull/66711), [#68813](https://github.com/PaddlePaddle/Paddle/pull/68813), [#68928](https://github.com/PaddlePaddle/Paddle/pull/68928), [#69054](https://github.com/PaddlePaddle/Paddle/pull/69054), [#69156](https://github.com/PaddlePaddle/Paddle/pull/69156), [#69255](https://github.com/PaddlePaddle/Paddle/pull/69255), [#69460](https://github.com/PaddlePaddle/Paddle/pull/69460), [#70270](https://github.com/PaddlePaddle/Paddle/pull/70270) +- Add unit tests for operators. [#68272](https://github.com/PaddlePaddle/Paddle/pull/68272), [#68490](https://github.com/PaddlePaddle/Paddle/pull/68490) +- Added operator API aliases for PaddleCustomDevice. [#69526](https://github.com/PaddlePaddle/Paddle/pull/69526) +- Define the position of the shift operator to ensure it only supports dynamic graphs. [#69289](https://github.com/PaddlePaddle/Paddle/pull/69289) +- Annotate only forward computation operators. [#68580](https://github.com/PaddlePaddle/Paddle/pull/68580) +- Change the inverse operator of the view operation to reuse the forward operator, thereby supporting the need for higher-order differentiation in scientific computing scenarios. [#71086](https://github.com/PaddlePaddle/Paddle/pull/71086) +- Migrate operator file location/modify function namespace/modify function parameter names, etc. [#66393](https://github.com/PaddlePaddle/Paddle/pull/66393), [#67066](https://github.com/PaddlePaddle/Paddle/pull/67066), [#67012](https://github.com/PaddlePaddle/Paddle/pull/67012), [#67243](https://github.com/PaddlePaddle/Paddle/pull/67243), [#67367](https://github.com/PaddlePaddle/Paddle/pull/67367), [#67760](https://github.com/PaddlePaddle/Paddle/pull/67760), [#67242](https://github.com/PaddlePaddle/Paddle/pull/67242), [#67189](https://github.com/PaddlePaddle/Paddle/pull/67189), [#67899](https://github.com/PaddlePaddle/Paddle/pull/67899), [#67687](https://github.com/PaddlePaddle/Paddle/pull/67687), [#68035](https://github.com/PaddlePaddle/Paddle/pull/68035), [#67682](https://github.com/PaddlePaddle/Paddle/pull/67682), [#68464](https://github.com/PaddlePaddle/Paddle/pull/68464), [#68469](https://github.com/PaddlePaddle/Paddle/pull/68469), [#67900](https://github.com/PaddlePaddle/Paddle/pull/67900), [#68563](https://github.com/PaddlePaddle/Paddle/pull/68563), [#68562](https://github.com/PaddlePaddle/Paddle/pull/68562), [#68564](https://github.com/PaddlePaddle/Paddle/pull/68564), [#68479](https://github.com/PaddlePaddle/Paddle/pull/68479), [#68588](https://github.com/PaddlePaddle/Paddle/pull/68588), [#68726](https://github.com/PaddlePaddle/Paddle/pull/68726), [#68719](https://github.com/PaddlePaddle/Paddle/pull/68719), [#68767](https://github.com/PaddlePaddle/Paddle/pull/68767), [#68557](https://github.com/PaddlePaddle/Paddle/pull/68557), [#68671](https://github.com/PaddlePaddle/Paddle/pull/68671), [#68786](https://github.com/PaddlePaddle/Paddle/pull/68786), [#67948](https://github.com/PaddlePaddle/Paddle/pull/67948), [#64999](https://github.com/PaddlePaddle/Paddle/pull/64999), [#68581](https://github.com/PaddlePaddle/Paddle/pull/68581), [#68361](https://github.com/PaddlePaddle/Paddle/pull/68361), [#68656](https://github.com/PaddlePaddle/Paddle/pull/68656), [#68396](https://github.com/PaddlePaddle/Paddle/pull/68396), [#68059](https://github.com/PaddlePaddle/Paddle/pull/68059), [#68785](https://github.com/PaddlePaddle/Paddle/pull/68785), [#68665](https://github.com/PaddlePaddle/Paddle/pull/68665), [#68869](https://github.com/PaddlePaddle/Paddle/pull/68869), [#67626](https://github.com/PaddlePaddle/Paddle/pull/67626), [#68921](https://github.com/PaddlePaddle/Paddle/pull/68921), [#69268](https://github.com/PaddlePaddle/Paddle/pull/69268), [#69271](https://github.com/PaddlePaddle/Paddle/pull/69271), [#69306](https://github.com/PaddlePaddle/Paddle/pull/69306), [#69302](https://github.com/PaddlePaddle/Paddle/pull/69302), [#69341](https://github.com/PaddlePaddle/Paddle/pull/69341), [#69364](https://github.com/PaddlePaddle/Paddle/pull/69364), [#69343](https://github.com/PaddlePaddle/Paddle/pull/69343), [#69383](https://github.com/PaddlePaddle/Paddle/pull/69383), [#69415](https://github.com/PaddlePaddle/Paddle/pull/69415), [#69437](https://github.com/PaddlePaddle/Paddle/pull/69437), [#69494](https://github.com/PaddlePaddle/Paddle/pull/69494), [#69541](https://github.com/PaddlePaddle/Paddle/pull/69541), [#69543](https://github.com/PaddlePaddle/Paddle/pull/69543), [#69540](https://github.com/PaddlePaddle/Paddle/pull/69540), [#69569](https://github.com/PaddlePaddle/Paddle/pull/69569), [#69568](https://github.com/PaddlePaddle/Paddle/pull/69568), [#69621](https://github.com/PaddlePaddle/Paddle/pull/69621), [#69622](https://github.com/PaddlePaddle/Paddle/pull/69622), [#69701](https://github.com/PaddlePaddle/Paddle/pull/69701), [#69702](https://github.com/PaddlePaddle/Paddle/pull/69702), [#69704](https://github.com/PaddlePaddle/Paddle/pull/69704), [#69743](https://github.com/PaddlePaddle/Paddle/pull/69743), [#69780](https://github.com/PaddlePaddle/Paddle/pull/69780), [#69814](https://github.com/PaddlePaddle/Paddle/pull/69814), [#69822](https://github.com/PaddlePaddle/Paddle/pull/69822), [#69893](https://github.com/PaddlePaddle/Paddle/pull/69893), [#69967](https://github.com/PaddlePaddle/Paddle/pull/69967), [#69976](https://github.com/PaddlePaddle/Paddle/pull/69976), [#70011](https://github.com/PaddlePaddle/Paddle/pull/70011), [#70015](https://github.com/PaddlePaddle/Paddle/pull/70015), [#70007](https://github.com/PaddlePaddle/Paddle/pull/70007), [#70010](https://github.com/PaddlePaddle/Paddle/pull/70010), [#70346](https://github.com/PaddlePaddle/Paddle/pull/70346), [#70414](https://github.com/PaddlePaddle/Paddle/pull/70414), [#69951](https://github.com/PaddlePaddle/Paddle/pull/69951), [#70299](https://github.com/PaddlePaddle/Paddle/pull/70299), [#70441](https://github.com/PaddlePaddle/Paddle/pull/70441), [#70435](https://github.com/PaddlePaddle/Paddle/pull/70435), [#68420](https://github.com/PaddlePaddle/Paddle/pull/68420), [#70671](https://github.com/PaddlePaddle/Paddle/pull/70671), [#70705](https://github.com/PaddlePaddle/Paddle/pull/70705), [#68540](https://github.com/PaddlePaddle/Paddle/pull/68540), [#70211](https://github.com/PaddlePaddle/Paddle/pull/70211), [#67489](https://github.com/PaddlePaddle/Paddle/pull/67489), [#66927](https://github.com/PaddlePaddle/Paddle/pull/66927), [#66942](https://github.com/PaddlePaddle/Paddle/pull/66942), [#66848](https://github.com/PaddlePaddle/Paddle/pull/66848), [#66796](https://github.com/PaddlePaddle/Paddle/pull/66796), [#67036](https://github.com/PaddlePaddle/Paddle/pull/67036), [#67244](https://github.com/PaddlePaddle/Paddle/pull/67244), [#67299](https://github.com/PaddlePaddle/Paddle/pull/67299), [#67171](https://github.com/PaddlePaddle/Paddle/pull/67171), [#67293](https://github.com/PaddlePaddle/Paddle/pull/67293), [#67208](https://github.com/PaddlePaddle/Paddle/pull/67208), [#67408](https://github.com/PaddlePaddle/Paddle/pull/67408), [#67523](https://github.com/PaddlePaddle/Paddle/pull/67523), [#67689](https://github.com/PaddlePaddle/Paddle/pull/67689), [#67694](https://github.com/PaddlePaddle/Paddle/pull/67694), [#67797](https://github.com/PaddlePaddle/Paddle/pull/67797), [#67894](https://github.com/PaddlePaddle/Paddle/pull/67894), [#65969](https://github.com/PaddlePaddle/Paddle/pull/65969), [#65939](https://github.com/PaddlePaddle/Paddle/pull/65939), [#67928](https://github.com/PaddlePaddle/Paddle/pull/67928), [#68097](https://github.com/PaddlePaddle/Paddle/pull/68097), [#66744](https://github.com/PaddlePaddle/Paddle/pull/66744), [#68496](https://github.com/PaddlePaddle/Paddle/pull/68496), [#66943](https://github.com/PaddlePaddle/Paddle/pull/66943), [#68773](https://github.com/PaddlePaddle/Paddle/pull/68773), [#69272](https://github.com/PaddlePaddle/Paddle/pull/69272) +- Move test file locations. [#67564](https://github.com/PaddlePaddle/Paddle/pull/67564), [#68266](https://github.com/PaddlePaddle/Paddle/pull/68266), [#68634](https://github.com/PaddlePaddle/Paddle/pull/68634) +- Pre-modification related to xshape output exit. [#67543](https://github.com/PaddlePaddle/Paddle/pull/67543), [#67572](https://github.com/PaddlePaddle/Paddle/pull/67572) -### Performance Optimization +### Improvement -- In order to reduce the bubble in pipeline parallel, support the reverse computation of parameter and activation splitting in backward, and add zero bubble pipeline scheduling strategy to improve the training performance. [#62865](https://github.com/PaddlePaddle/Paddle/pull/62865), [#62737](https://github.com/PaddlePaddle/Paddle/pull/62737), [#64534](https://github.com/PaddlePaddle/Paddle/pull/64534), -- To improve the performance of sequence parallel, perform fusion on related communication operations and computation operations, and optimize redundant transopse operations. [#64807](https://github.com/PaddlePaddle/Paddle/pull/64807), [#63948](https://github.com/PaddlePaddle/Paddle/pull/63948), [#64316](https://github.com/PaddlePaddle/Paddle/pull/64316), [#64119](https://github.com/PaddlePaddle/Paddle/pull/64119) -- Optimize the time consumption of auto parallel graph optimization for static graphs, to reduce the delay from the start of training to the completion of the first step. [#59912](https://github.com/PaddlePaddle/Paddle/pull/59912), [#61817](https://github.com/PaddlePaddle/Paddle/pull/61817), [#60022](https://github.com/PaddlePaddle/Paddle/pull/60022), [#60125](https://github.com/PaddlePaddle/Paddle/pull/60125) -- Optimize the time consumption of related communication operations in hybrid parallel scenarios. [#62157](https://github.com/PaddlePaddle/Paddle/pull/62157), [#61622](https://github.com/PaddlePaddle/Paddle/pull/61622) -- Optimize the redundant video memory consumption of parameters under the auto parallel dynamic-to-static. [#62746](https://github.com/PaddlePaddle/Paddle/pull/62746) -- Improve the hybrid precision training function of auto parallel, support the setting of local auto_cast and black/white list, support master grad function, and adapt to different parallel strategies. [60158](https://github.com/PaddlePaddle/Paddle/pull/60158), [#59987](https://github.com/PaddlePaddle/Paddle/pull/59987), [#62629](https://github.com/PaddlePaddle/Paddle/pull/62629), [#60385](https://github.com/PaddlePaddle/Paddle/pull/60385), [#62015](https://github.com/PaddlePaddle/Paddle/pull/62015), [#60514](https://github.com/PaddlePaddle/Paddle/pull/60514), [#61221](https://github.com/PaddlePaddle/Paddle/pull/61221), [#60779](https://github.com/PaddlePaddle/Paddle/pull/60779), [#63228](https://github.com/PaddlePaddle/Paddle/pull/63228) -- Optimize non-essential casts caused by type promotion and amp to improve performance. [#63293](https://github.com/PaddlePaddle/Paddle/pull/63293), [#63228](https://github.com/PaddlePaddle/Paddle/pull/63228) +- Supported more data types. [#69143](https://github.com/PaddlePaddle/Paddle/pull/69143) +- Update xpu interface. [#69800](https://github.com/PaddlePaddle/Paddle/pull/69800) +- Improved operator printing functionality. [#69916](https://github.com/PaddlePaddle/Paddle/pull/69916) +- Upgraded the normalize operation to support more scenarios. [#70152](https://github.com/PaddlePaddle/Paddle/pull/70152) +- Extended group_norm to handle cases where the rank is greater than 5. [#68774](https://github.com/PaddlePaddle/Paddle/pull/68774) +- Improved the usage of backward_blacklist. [#69356](https://github.com/PaddlePaddle/Paddle/pull/69356) -### Upgrade Static Graph Auto Parallel Architecture +### Performance improvement -- Based on the new generation of Intermediate Representation(PIR), add the new DistDialect, natively supporting DistAttr and DistTensor in computation graph representation, and realizing the direct binding of distributed attributes between tensor or operator, which making the auto-parallel architecture more simple and unified. [#63828](https://github.com/PaddlePaddle/Paddle/pull/63828), [#64299](https://github.com/PaddlePaddle/Paddle/pull/64299), [#63870](https://github.com/PaddlePaddle/Paddle/pull/63870), [#64144](https://github.com/PaddlePaddle/Paddle/pull/64144), [#62524](https://github.com/PaddlePaddle/Paddle/pull/62524), [#62630](https://github.com/PaddlePaddle/Paddle/pull/62630), [#62897](https://github.com/PaddlePaddle/Paddle/pull/62897), [#60478](https://github.com/PaddlePaddle/Paddle/pull/60478), [#60574](https://github.com/PaddlePaddle/Paddle/pull/60574), [#63876](https://github.com/PaddlePaddle/Paddle/pull/63876), [#63798](https://github.com/PaddlePaddle/Paddle/pull/63798), [#62560](https://github.com/PaddlePaddle/Paddle/pull/62560), [#63676](https://github.com/PaddlePaddle/Paddle/pull/63676) -- Improve APIs such as shard_tensor, reshard, and to_static, to support users to convert the dynamic graph model networking directly into PIR static computation graph for better performance. [#62945](https://github.com/PaddlePaddle/Paddle/pull/62945), [#62356](https://github.com/PaddlePaddle/Paddle/pull/62356), [#60175](https://github.com/PaddlePaddle/Paddle/pull/60175), [#62654](https://github.com/PaddlePaddle/Paddle/pull/62654), [#63347](https://github.com/PaddlePaddle/Paddle/pull/63347) -- Optimize the auto-parallel graph optimization compilation process, and reduce the compilation and optimization time of static graphs by refactoring and optimizing the procedure of computation graph parallelization and communication resolution. [#64137](https://github.com/PaddlePaddle/Paddle/pull/64137), [#62201](https://github.com/PaddlePaddle/Paddle/pull/62201), [#64143](https://github.com/PaddlePaddle/Paddle/pull/64143), [#62560](https://github.com/PaddlePaddle/Paddle/pull/62560) -- Optimize the procedure of the SPMD derivation in static graphs to achieve the consistency results under dynamic-static graphs, which improves the unity and stability of the architecture. [#62659](https://github.com/PaddlePaddle/Paddle/pull/62659), [#62547](https://github.com/PaddlePaddle/Paddle/pull/62547), [#63117](https://github.com/PaddlePaddle/Paddle/pull/63117), [#63434](https://github.com/PaddlePaddle/Paddle/pull/63434), [#63770](https://github.com/PaddlePaddle/Paddle/pull/63770), [#64361](https://github.com/PaddlePaddle/Paddle/pull/64361), [#63073](https://github.com/PaddlePaddle/Paddle/pull/63073) -- Upgrade the implementation of Reshard conversion in static graphs, and use consistent conversion rules under dynamic-static graphs to ensure the consistency of the execution logic and results of tensor reshard conversion in dynamic-static graphs, so as to improve user experience. [#62718](https://github.com/PaddlePaddle/Paddle/pull/62718), [#62694](https://github.com/PaddlePaddle/Paddle/pull/62694), [#60215](https://github.com/PaddlePaddle/Paddle/pull/60215), [#63362](https://github.com/PaddlePaddle/Paddle/pull/63362), [#63072](https://github.com/PaddlePaddle/Paddle/pull/63072), [#63962](https://github.com/PaddlePaddle/Paddle/pull/63962), [#64223](https://github.com/PaddlePaddle/Paddle/pull/64223), [#61796](https://github.com/PaddlePaddle/Paddle/pull/61796), [#64465](https://github.com/PaddlePaddle/Paddle/pull/64465), [#64623](https://github.com/PaddlePaddle/Paddle/pull/64623), [#64418](https://github.com/PaddlePaddle/Paddle/pull/64418) +- Optimized the performance of the `where_double_grad` operator. [#70404](https://github.com/PaddlePaddle/Paddle/pull/70404) +- Change "for range" to "slice" to speed up the execution of grad. [#69938](https://github.com/PaddlePaddle/Paddle/pull/69938) -### Automatic Search and Tuning of Training Strategies +## 6. Framework performance optimization -In order to improve the ease of use of the training strategy automatic search and tuning tool (AutoTuner), support user-defined search items, support for setting the priority of search items, and support for user-configured illegal strategy combinations, to comprehensively enhance the error reporting information in the runtime and post-run logs, and support for AutoTuner on NPU devices. [#60101](https://github.com/PaddlePaddle/Paddle/pull/60101), [#60294](https://github.com/PaddlePaddle/Paddle/pull/60294), [#61898](https://github.com/PaddlePaddle/Paddle/pull/61898), [#60248](https://github.com/PaddlePaddle/Paddle/pull/60248), [#60417](https://github.com/PaddlePaddle/Paddle/pull/60417), [#60954](https://github.com/PaddlePaddle/Paddle/pull/60954), [#61499](https://github.com/PaddlePaddle/Paddle/pull/61499), [#62724](https://github.com/PaddlePaddle/Paddle/pull/62724), [#60954](https://github.com/PaddlePaddle/Paddle/pull/60954), [#63693](https://github.com/PaddlePaddle/Paddle/pull/63693), [#62853](https://github.com/PaddlePaddle/Paddle/pull/62853), [#62984](https://github.com/PaddlePaddle/Paddle/pull/62984) +PRs related to performance optimization, encompassing optimizing operator performance, enhancing kernel performance, optimizing memory usage, and refining namespaces, all aim to provide users with a superior development experience. -## Cuda Training Performance Optimization +### New Features -This upgrade achieves the improvement of large model training efficiency from multiple perspectives, such as operator computation efficiency, distributed communication optimization, and video memory optimization. +- Enhanced support for fp8 type. [#64735](https://github.com/PaddlePaddle/Paddle/pull/64735), [#64955](https://github.com/PaddlePaddle/Paddle/pull/64955) +- Enhanced support for XPU. [#65362](https://github.com/PaddlePaddle/Paddle/pull/65362), [#65304](https://github.com/PaddlePaddle/Paddle/pull/65304), [#68451](https://github.com/PaddlePaddle/Paddle/pull/68451) +- Enhanced support for DCU. [#65398](https://github.com/PaddlePaddle/Paddle/pull/65398), [#65857](https://github.com/PaddlePaddle/Paddle/pull/65857), [#66423](https://github.com/PaddlePaddle/Paddle/pull/66423) +- Expand the capabilities of oneDNN. [#66000](https://github.com/PaddlePaddle/Paddle/pull/66000), [#66474](https://github.com/PaddlePaddle/Paddle/pull/66474), [#66568](https://github.com/PaddlePaddle/Paddle/pull/66568) +- Rename parameters and support more complex masks. [#65409](https://github.com/PaddlePaddle/Paddle/pull/65409) +- Support for flash-attention. [#68968](https://github.com/PaddlePaddle/Paddle/pull/68968) +- Support OpenVINO CPU high-performance inference. [#69122](https://github.com/PaddlePaddle/Paddle/pull/69122) -### Function Improvements +### Functional improvements -- Enhance the FlashAttention operator function, including support for NVIDIA SM90 GPU compilation, support for Group Query Attention, support for cuDNN access, support for QKV-packed form inputs, and so on. [#59820](https://github.com/PaddlePaddle/Paddle/pull/59820),[#60776](https://github.com/PaddlePaddle/Paddle/pull/60776),[#58680](https://github.com/PaddlePaddle/Paddle/pull/58680),[#63289](https://github.com/PaddlePaddle/Paddle/pull/63289) -- In the Repeat_interleave operator, add support for BFloat16 data type. [#61854](https://github.com/PaddlePaddle/Paddle/pull/61854) -- For the issues of many interface parameters of ResNet-like models such as fused_scale_bias_add_relu, fused_scale_bias_relu_conv_bn, and fused_dconv_drelu_dbn, and the ease of use of operators, add the fuse_resunit pass, to support automatic fusion of the abovementioned operators, to achieve generic performance optimization. ([#59771](https://github.com/PaddlePaddle/Paddle/pull/59771)) +- Enhance PIR pass to achieve better fusion. [#65540](https://github.com/PaddlePaddle/Paddle/pull/65540) +- Enhanced OneDNN functionality. [#65971](https://github.com/PaddlePaddle/Paddle/pull/65971), [#70430](https://github.com/PaddlePaddle/Paddle/pull/70430), [#70630](https://github.com/PaddlePaddle/Paddle/pull/70630), [#70871](https://github.com/PaddlePaddle/Paddle/pull/70871) +- Improve the performance of FlashMask. [#68109](https://github.com/PaddlePaddle/Paddle/pull/68109) +- Optimize kernel performance. [#69660](https://github.com/PaddlePaddle/Paddle/pull/69660), [#69596](https://github.com/PaddlePaddle/Paddle/pull/69596) +- Combinatorial operator optimization. [#69515](https://github.com/PaddlePaddle/Paddle/pull/69515), [#69616](https://github.com/PaddlePaddle/Paddle/pull/69616) -### Performance Improvement +### Bug Fixes -- To address the problem of large GPU memory consumption during the computation of SwiGLU activation module of the Llama models, add the SwiGLU fusion operator to save the memory consumption of intermediate variables, thus reducing the memory overhead during the training process of the large model, and reducing the recomputation to improve the performance. The performance of the Llama-70B model is improved by 9%. [#61508](https://github.com/PaddlePaddle/Paddle/pull/61508) -- To address the problem of higher percentage of communications in Sequence Parallel, realize the overlap between Sequence Parallel reverse process communication and Matmul computation, saving the end-to-end time consumption and improving the end-to-end performance of large model training scenarios by 1%~2%. [#62284](https://github.com/PaddlePaddle/Paddle/pull/62284),[#63531](https://github.com/PaddlePaddle/Paddle/pull/63531) -- For the problem of slow training speed due to the need to divide by nranks after sharding reverse communications, support the fusion of reverse communication and division by nranks operation, and support the mode of ReduceScatter Average, to improve the performance of large model training. [#62623](https://github.com/PaddlePaddle/Paddle/pull/62623) -- For the problem of jitter training speed caused by the input data broadcasting process of the tensor model parallel process, fix the unnecessary synchronization between CPU and GPU in the data broadcasting, to ensure the stability of the training speed. [#60816](https://github.com/PaddlePaddle/Paddle/pull/60816) -- For the problem of low training speed due to the long parallel P2P communication time of pipelined models, realize the overlap of P2P communication and forward-backward computation. The end-to-end training performance of large models is improved by 2%~3%. [#61935](https://github.com/PaddlePaddle/Paddle/pull/61935),[#62051](https://github.com/PaddlePaddle/Paddle/pull/62051,[#62051](https://github.com/PaddlePaddle/Paddle/pull/62051)) -- For the problem of low inefficiency of bias gradient computation of fused_linear_param_grad_add operator, optimize the computation efficiency of bias gradient computation, and improve the end-to-end training performance of large model by 0.2%. [#63114](https://github.com/PaddlePaddle/Paddle/pull/63114) -- For the problem of long time-consuming parameter broadcasting process after the end of sharding reverse computation, implement the overlap between parameter broadcasting and next step computation. As a result, the end-to-end training performance of large model is improved by more than 2%. [#63945](https://github.com/PaddlePaddle/Paddle/pull/63945) -- To address the problem that the gradient occupies too much video memory during the pipelined parallel training, as a result of slow training speed due to the introduction of multiple computations, we have implemented the gradient dynamic release technique, to improve the end-to-end training performance of large models by 3.4%. [#59739](https://github.com/PaddlePaddle/Paddle/pull/59739) +- Fixed bugs related to PIR, CINN, SOT, OneDNN, etc. [#68951](https://github.com/PaddlePaddle/Paddle/pull/68951), [#69553](https://github.com/PaddlePaddle/Paddle/pull/69553), [#69682](https://github.com/PaddlePaddle/Paddle/pull/69682), [#67741](https://github.com/PaddlePaddle/Paddle/pull/67741), [#69346](https://github.com/PaddlePaddle/Paddle/pull/69346), [#69401](https://github.com/PaddlePaddle/Paddle/pull/69401), [#68903](https://github.com/PaddlePaddle/Paddle/pull/68903) +- Fixed bugs related to composite operators. [#69479](https://github.com/PaddlePaddle/Paddle/pull/69479), [#69487](https://github.com/PaddlePaddle/Paddle/pull/69487), [#67176](https://github.com/PaddlePaddle/Paddle/pull/67176) +- Fixed the issue with the FP8 data type on the CPU. [#65539](https://github.com/PaddlePaddle/Paddle/pull/65539) +- Remove unnecessary overhead for creating events in computational flow. [#67315](https://github.com/PaddlePaddle/Paddle/pull/67247) +- Fixed performance issues. [#68378](https://github.com/PaddlePaddle/Paddle/pull/68378) +- Fixed issues related to types. [#69720](https://github.com/PaddlePaddle/Paddle/pull/69720) +- Fixed other issues. [#70019](https://github.com/PaddlePaddle/Paddle/pull/70019), [#70008](https://github.com/PaddlePaddle/Paddle/pull/70008), [#70645](https://github.com/PaddlePaddle/Paddle/pull/70645), [#71209](https://github.com/PaddlePaddle/Paddle/pull/71209), [#68152](https://github.com/PaddlePaddle/Paddle/pull/68152), [#69907](https://github.com/PaddlePaddle/Paddle/pull/69907), [#71207](https://github.com/PaddlePaddle/Paddle/pull/71207) -### Bug Fixing +### Performance optimization -- Fix the problem of StreamSafeCUDAAllocator CUDA Event resource leakage, as a result of slowdown of large model training. [#64621](https://github.com/PaddlePaddle/Paddle/pull/64621) -- Fix the bug of reverse calculation error of fused_rotary_position_embedding operator. [#60217](https://github.com/PaddlePaddle/Paddle/pull/60217) -- Fix the bug that customized operators cannot control the calculation accuracy by black and white lists in AMP scenarios. [#60052](https://github.com/PaddlePaddle/Paddle/pull/60052) -- Fix the bug that operators such as add_, and divide_ natively supporting operations with different data types have unanticipated type boosting when type boosting occurs. [#64302](https://github.com/PaddlePaddle/Paddle/pull/64302) +- Optimizations related to the CINN compiler. [#69455](https://github.com/PaddlePaddle/Paddle/pull/69455), [#70284](https://github.com/PaddlePaddle/Paddle/pull/70284), [#67576](https://github.com/PaddlePaddle/Paddle/pull/67576), [#68946](https://github.com/PaddlePaddle/Paddle/pull/68946), [#68615](https://github.com/PaddlePaddle/Paddle/pull/68615) +- Optimizations related to oneDNN. [#68784](https://github.com/PaddlePaddle/Paddle/pull/68784), [#68716](https://github.com/PaddlePaddle/Paddle/pull/68716), [#67554](https://github.com/PaddlePaddle/Paddle/pull/67554) +- Memory-related optimizations. [#68660](https://github.com/PaddlePaddle/Paddle/pull/68571), [#69930](https://github.com/PaddlePaddle/Paddle/pull/69930), [#68174](https://github.com/PaddlePaddle/Paddle/pull/68174), [#68660](https://github.com/PaddlePaddle/Paddle/pull/68571), [#70359](https://github.com/PaddlePaddle/Paddle/pull/70359) +- Kernel computation-related optimizations. [#65507](https://github.com/PaddlePaddle/Paddle/pull/65507), [#68541](https://github.com/PaddlePaddle/Paddle/pull/68541), [#71479](https://github.com/PaddlePaddle/Paddle/pull/71479), [#71403](https://github.com/PaddlePaddle/Paddle/pull/71403) +- XPU-related optimizations. [#67051](https://github.com/PaddlePaddle/Paddle/pull/67051) +- Other optimizations include pass optimization of the inference process, dynamic shape optimization in automatic parallelism, and FlashAttention computation optimization. [#68394](https://github.com/PaddlePaddle/Paddle/pull/68394), [#68696](https://github.com/PaddlePaddle/Paddle/pull/68696), [#68759](https://github.com/PaddlePaddle/Paddle/pull/68759), [#68791](https://github.com/PaddlePaddle/Paddle/pull/68791), [#69390](https://github.com/PaddlePaddle/Paddle/pull/69390), [#69961](https://github.com/PaddlePaddle/Paddle/pull/69961), [#69939](https://github.com/PaddlePaddle/Paddle/pull/69939), [#70455](https://github.com/PaddlePaddle/Paddle/pull/70455), [#70663](https://github.com/PaddlePaddle/Paddle/pull/70663), [#71290](https://github.com/PaddlePaddle/Paddle/pull/71123) -## Distributed Strategy Enhancements +### Others -Focus on strengthening the functional experience of PaddlePaddle dynamic graph distributed computing, and make various functional improvements to parallel strategies such as AutoTuner, pipeline parallel, and sharding, and enhance the flexibility of large model training. Add the features such as Flash Attention Mask, which significantly reduce the video memory usage of large model training, especially long-sequence training, improve training performance, and provide stronger capability support for large model training. In addition, several bugs and potential security risks have been fixed, which has significantly improved the overall stability of the system. +- Modify function namespaces. [#66818](https://github.com/PaddlePaddle/Paddle/pull/66818), [#67023](https://github.com/PaddlePaddle/Paddle/pull/67023), [#67114](https://github.com/PaddlePaddle/Paddle/pull/67114), [#67217](https://github.com/PaddlePaddle/Paddle/pull/67217), [#67524](https://github.com/PaddlePaddle/Paddle/pull/67524), [#67796](https://github.com/PaddlePaddle/Paddle/pull/67796), [#67881](https://github.com/PaddlePaddle/Paddle/pull/67881) + Upgrade OneDNN. [#69917](https://github.com/PaddlePaddle/Paddle/pull/69917) +- Modify the pass level. [#69524](https://github.com/PaddlePaddle/Paddle/pull/69524) +- Optimizations related to memory read and write. [#65804](https://github.com/PaddlePaddle/Paddle/pull/65804), [#66923](https://github.com/PaddlePaddle/Paddle/pull/66923) +- Optimize the GetValueName-related signatures. [#66363](https://github.com/PaddlePaddle/Paddle/pull/66363), [#66559](https://github.com/PaddlePaddle/Paddle/pull/66559), [#66738](https://github.com/PaddlePaddle/Paddle/pull/66738) -### Function Optimization +### Discarded -- Optimize the search space of Autotuner, which significantly improves the performance of search. [#62608](https://github.com/PaddlePaddle/Paddle/pull/62608) -- For the problem of pipeline parallel that the training may be wrong due to the checking of sending type in the eval process, add the training configuration, to skip the redundant receiving check of pipelined sending, featuring higher flexibility and better performance. [#63001](https://github.com/PaddlePaddle/Paddle/pull/63001) -- In the dynamic graph pipeline parallel, add the checking of the size and type of the sent and received data, and add the error message, making the robustness and debuggability better. [#59405](https://github.com/PaddlePaddle/Paddle/pull/59405) -- Support the settings of multiple loss functions with returning multiple losses in dynamic graph pipeline, which improves the flexibility of dynamic graph pipeline. [#63167](https://github.com/PaddlePaddle/Paddle/pull/63167) -- In the dynamic graph pipeline, add the pipeline cache clearing configuration option, to clear the cache sent and received in the pipeline in time to better support dynamic batchsize training. [#62277](https://github.com/PaddlePaddle/Paddle/pull/62277) -- For the problem that the sharding stage3 strategy cannot be aligned bit by bit, replace the unordered set with OrderedSet to avoid the error caused by the accumulation order, as a result of alignment bit by bit after fixing. [#60085](https://github.com/PaddlePaddle/Paddle/pull/60085) -- In order to further reduce the video memory usage in sequence parallel, add a new method of recalculating allgather, to reduce the video memory size of the activation of allgather. [#64244](https://github.com/PaddlePaddle/Paddle/pull/64244) +- Remove obsolete files and functions. [#67514](https://github.com/PaddlePaddle/Paddle/pull/67514), [#67811](https://github.com/PaddlePaddle/Paddle/pull/67811), [#67911](https://github.com/PaddlePaddle/Paddle/pull/67911) -### New Features for Dynamic Graphs +## 7. Inferential deployment -- For the search space of autotuner, add a new search dimension of refined recompute, which makes the search result more accurate and the threshold of model tuning lower. [#62430](https://github.com/PaddlePaddle/Paddle/pull/62430) -- For the problem of limiting the training batch size in virtual pipeline parallel, modify the pipeline scheduling method, to flexibly set the batch size, so as to support more flexible batch size. [#61561](https://github.com/PaddlePaddle/Paddle/pull/61561),[#60314](https://github.com/PaddlePaddle/Paddle/pull/60134) -- In order to solve the problem that the video memory occupation of the mask is a quadratic complexity with low performance in sequence length when using flash attention with a mask, the memory complexity of the mask is reduced from the quadrature of the sequence length to the first square by using the sparse mask, to optimize the memory of the mask. This reduces the number of storage accesses. Meanwhile, use share memory to accelerate memory access, greatly improving the performance. [#62029](https://github.com/PaddlePaddle/Paddle/pull/62029) -- Add the dynamic graph sharding parallel strategy, to improve the communications and computation overlap function, to improve the performance of the training process. [#60455](https://github.com/PaddlePaddle/Paddle/pull/60455) +Focusing on two core directions: **the construction of the new generation of Proven Intermediate Representation (PIR) ecosystem** and **large model inference optimization**, the main breakthroughs include: -### Communication Library Function Optimization +1. **Deep fusion of PIR-TensorRT** -- Enhance the functionality of the NCCL communication library to support the initialization of customized NCCL libraries by passing additional initialization parameters during initialization. [#62193](https://github.com/PaddlePaddle/Paddle/pull/62193) -- Add the NCCL library path search function to support more flexible NCCL library search methods. [#62492](https://github.com/PaddlePaddle/Paddle/pull/62492) +- Complete the refactoring and code optimization of the core execution mechanism, and develop over 50 operator converters +- Added low-precision support (FP16/INT8) and Generic Plugin execution capability +- Build a complete unit testing system that supports the entire process of model loading/saving -### Bug Fixing +2. **Leap in reasoning performance of large models** -- Fix the problem of dbias_out space application of fused_linear_param_grad_add_kernel operator, and add the gradient address checking logic to make the error message easier to debug. [#363433](https://github.com/PaddlePaddle/Paddle/pull/63433),[#64460](https://github.com/PaddlePaddle/Paddle/pull/64460) -- Fix the problem that the sharding policy does not scale the gradient when comm_overlap is turned off in the support of reduce_avg operation. [#62702](https://github.com/PaddlePaddle/Paddle/pull/62702) -- Fix the bug related to fusion in the calculation order of main grad in Stage2. [#59142](https://github.com/PaddlePaddle/Paddle/pull/59142) -- Fix the bug that the switch attribute cannot be found when reduce_avg communication operation is turned on under the sharding strategy. [#62502](https://github.com/PaddlePaddle/Paddle/pull/62502) -- Fix the problem of setting stop_gradient=True for some parameters when Sharding stage1 training supports non-training parameter training. [#62616](https://github.com/PaddlePaddle/Paddle/pull/62616) -- Fix the bug of message printing when TCP is turned off, to prevent misleading users. [#62631](https://github.com/PaddlePaddle/Paddle/pull/62631) -- Fix the DataParallel training problem and solve multi-card training error when some gradients are not initialized and segmentation fault error occurs in data parallel training. [#62299](https://github.com/PaddlePaddle/Paddle/pull/62299) -- For the scenario of turning on sequence parallel, fix the bug caused by weight freezing in some models. [#63596](https://github.com/PaddlePaddle/Paddle/pull/63596) -- Fix some bugs for autotuner scenarios with single dp. [#60757](https://github.com/PaddlePaddle/Paddle/pull/60757) -- Fix aadiff bug of streaming parallel strategy. ([#64716](https://github.com/PaddlePaddle/Paddle/pull/64716)) -- Remove some distributed unit tests. ([#62762](https://github.com/PaddlePaddle/Paddle/pull/62762)) +- Added full-process support for the Mixture of Experts (MoE) system, covering Hopper architecture optimization +- Supports processing of 128K ultra-long sequences, enhancing long text reasoning capabilities +- Implement cutting-edge quantization schemes such as FP8/W8A8 to reduce memory usage -### Security Risk Fixing +3. **Comprehensive upgrade of infrastructure** -- Fix security vulnerability against security leakage risk in prune_by_memory_estimation operator. [#61320](https://github.com/PaddlePaddle/Paddle/pull/61320) +- OneDNN has been upgraded to version 3.6, significantly enhancing CPU inference performance +- Model loading speed optimized by over 40%, supporting fast loading of PIR models +- Improve distributed inference support and fix allreduce data type issues -## Parameter Server +### New Features -This update mainly fixes several bugs in the process of using the parameter server as well as compilation and installation issues. +- Support Paddle-TensorRT based on PaddlePaddle's new generation of intermediate representation (PIR) +- Development of core basic execution mechanism functions and code optimization. [#64995](https://github.com/PaddlePaddle/Paddle/pull/64995), [#67054](https://github.com/PaddlePaddle/Paddle/pull/67054), [#67660](https://github.com/PaddlePaddle/Paddle/pull/67660), [#67755](https://github.com/PaddlePaddle/Paddle/pull/67755), [#70762](https://github.com/PaddlePaddle/Paddle/pull/70762), +- Development of operator Marker and Converter. [#67753](https://github.com/PaddlePaddle/Paddle/pull/67753),[#67956](https://github.com/PaddlePaddle/Paddle/pull/67956),[#68084](https://github.com/PaddlePaddle/Paddle/pull/68084),[#67974](https://github.com/PaddlePaddle/Paddle/pull/67974),[#68395](https://github.com/PaddlePaddle/Paddle/pull/68395),[#68216](https://github.com/PaddlePaddle/Paddle/pull/68216),[#68529](https://github.com/PaddlePaddle/Paddle/pull/68529),[#68608](https://github.com/PaddlePaddle/Paddle/pull/68608), [#68663](https://github.com/PaddlePaddle/Paddle/pull/68663),[#68757](https://github.com/PaddlePaddle/Paddle/pull/68757),[#68614](https://github.com/PaddlePaddle/Paddle/pull/68614),[#68783](https://github.com/PaddlePaddle/Paddle/pull/68783),[#68775](https://github.com/PaddlePaddle/Paddle/pull/68775),[#68839](https://github.com/PaddlePaddle/Paddle/pull/68839),[#68686](https://github.com/PaddlePaddle/Paddle/pull/68686),[#68840](https://github.com/PaddlePaddle/Paddle/pull/68840),[#68941](https://github.com/PaddlePaddle/Paddle/pull/68941),[#69015](https://github.com/PaddlePaddle/Paddle/pull/69015),[#69038](https://github.com/PaddlePaddle/Paddle/pull/69038),[#69117](https://github.com/PaddlePaddle/Paddle/pull/69117),[#69208](https://github.com/PaddlePaddle/Paddle/pull/69208),[#69315](https://github.com/PaddlePaddle/Paddle/pull/69315),[#69261](https://github.com/PaddlePaddle/Paddle/pull/69261),[#68878](https://github.com/PaddlePaddle/Paddle/pull/68878),[#69705](https://github.com/PaddlePaddle/Paddle/pull/69705),[#69706](https://github.com/PaddlePaddle/Paddle/pull/69706),[#70170](https://github.com/PaddlePaddle/Paddle/pull/70170),[#70267](https://github.com/PaddlePaddle/Paddle/pull/70267),[#70429](https://github.com/PaddlePaddle/Paddle/pull/70429),[#69330](https://github.com/PaddlePaddle/Paddle/pull/69330),[#70507](https://github.com/PaddlePaddle/Paddle/pull/70507),[#70535](https://github.com/PaddlePaddle/Paddle/pull/70535),[#70667](https://github.com/PaddlePaddle/Paddle/pull/70667),[#70816](https://github.com/PaddlePaddle/Paddle/pull/70816),[#70826](https://github.com/PaddlePaddle/Paddle/pull/70826),[#70955](https://github.com/PaddlePaddle/Paddle/pull/70955),[#71028](https://github.com/PaddlePaddle/Paddle/pull/71028),[#71013](https://github.com/PaddlePaddle/Paddle/pull/71013),[#71157](https://github.com/PaddlePaddle/Paddle/pull/71157),[#71231](https://github.com/PaddlePaddle/Paddle/pull/71231),[#69199](https://github.com/PaddlePaddle/Paddle/pull/69199),[#68956](https://github.com/PaddlePaddle/Paddle/pull/68956),[#66658](https://github.com/PaddlePaddle/Paddle/pull/66658),[#66811](https://github.com/PaddlePaddle/Paddle/pull/66811),[#67519](https://github.com/PaddlePaddle/Paddle/pull/67519),[#67877](https://github.com/PaddlePaddle/Paddle/pull/67877),[#68090](https://github.com/PaddlePaddle/Paddle/pull/68090),[#69086](https://github.com/PaddlePaddle/Paddle/pull/69086),[#68787](https://github.com/PaddlePaddle/Paddle/pull/68787),[#68778](https://github.com/PaddlePaddle/Paddle/pull/68778),[#69318](https://github.com/PaddlePaddle/Paddle/pull/69318),[#69995](https://github.com/PaddlePaddle/Paddle/pull/69995),[#70325](https://github.com/PaddlePaddle/Paddle/pull/70325),[#70817](https://github.com/PaddlePaddle/Paddle/pull/70817),[#70879](https://github.com/PaddlePaddle/Paddle/pull/70879),[#70875](https://github.com/PaddlePaddle/Paddle/pull/70875),[#71041](https://github.com/PaddlePaddle/Paddle/pull/71041),[#68876](https://github.com/PaddlePaddle/Paddle/pull/68876) +- Support for Generic Plugin execution function. [#66634](https://github.com/PaddlePaddle/Paddle/pull/66634), [#70251](https://github.com/PaddlePaddle/Paddle/pull/70251) +- Low-precision (FP16, INT8) function support. [#69597](https://github.com/PaddlePaddle/Paddle/pull/69597), [#71127](https://github.com/PaddlePaddle/Paddle/pull/71127), +- Auxiliary functions such as the single test system and pass usage support have been improved [#67525](https://github.com/PaddlePaddle/Paddle/pull/67525), [#68034](https://github.com/PaddlePaddle/Paddle/pull/68034), [#71281](https://github.com/PaddlePaddle/Paddle/pull/71281), [#71235](https://github.com/PaddlePaddle/Paddle/pull/71235), [#67568](https://github.com/PaddlePaddle/Paddle/pull/67568), [#70139](https://github.com/PaddlePaddle/Paddle/pull/70139), [#70529](https://github.com/PaddlePaddle/Paddle/pull/70529) +- Large model inference optimization +- Added fused_moe function support (basic support/non-standard TopK/Hopper architecture) [#66084](https://github.com/PaddlePaddle/Paddle/pull/66084), [#67425](https://github.com/PaddlePaddle/Paddle/pull/67425), [#67732](https://github.com/PaddlePaddle/Paddle/pull/67732) +- Support for mixed precision computation (GQA mixed precision/BF16 registration) [#65078](https://github.com/PaddlePaddle/Paddle/pull/65078), [#67769](https://github.com/PaddlePaddle/Paddle/pull/67769) +- Added inference optimization features (dynamic graph inference/support for 128K long sequences) [#65962](https://github.com/PaddlePaddle/Paddle/pull/65962), [#70088](https://github.com/PaddlePaddle/Paddle/pull/70088) +- Added implementation of quantization inference operator (FP8 W8A8 computation/weight-only int4 quantization) [#65441](https://github.com/PaddlePaddle/Paddle/pull/65441), [#64094](https://github.com/PaddlePaddle/Paddle/pull/64094) + +### Feature-complete + +- The functional mechanism of Inference is well-established under PIR +- The executor supports loading .json models [#65223](https://github.com/PaddlePaddle/Paddle/pull/65223) +- Support controllable PIR mode switch-on/off [#65596](https://github.com/PaddlePaddle/Paddle/pull/65596) +- Improved reasoning mechanism of large models +- Optimized gemm algorithm search (cublaslt global search/offline caching) [#65597](https://github.com/PaddlePaddle/Paddle/pull/65597), [#66132](https://github.com/PaddlePaddle/Paddle/pull/66132) +- Enhance type system compatibility (PD_VISIT_FLOATING_AND_HALF_TYPES) [#71022](https://github.com/PaddlePaddle/Paddle/pull/71022) +- Optimized attention mechanism (support for multiple blocks of MMHA/XPU) [#67211](https://github.com/PaddlePaddle/Paddle/pull/67211), [#68104](https://github.com/PaddlePaddle/Paddle/pull/68104) + +### Performance optimization + +- OneDNN has been upgraded to version 3.6, resulting in a general improvement in model inference performance on GNR/EMR devices [#69386](https://github.com/PaddlePaddle/Paddle/pull/69386) +- Operator performance optimization (layer_norm/top_p_sampling) [#65711](https://github.com/PaddlePaddle/Paddle/pull/65711) +- Model loading acceleration (regular/PIR model) [#69110](https://github.com/PaddlePaddle/Paddle/pull/69110), [#70219](https://github.com/PaddlePaddle/Paddle/pull/70219) + +### Bug fixes + +- Fixed issues related to Predictor when saving/loading PIR models. [#65180](https://github.com/PaddlePaddle/Paddle/pull/65180), [#65019](https://github.com/PaddlePaddle/Paddle/pull/65019), [#65714](https://github.com/PaddlePaddle/Paddle/pull/65714), [#69619](https://github.com/PaddlePaddle/Paddle/pull/69619), [#67570](https://github.com/PaddlePaddle/Paddle/pull/67570), [#65595](https://github.com/PaddlePaddle/Paddle/pull/65595), [#69200](https://github.com/PaddlePaddle/Paddle/pull/69200) +- Fixed execution issues of reasoning unit tests in scenarios such as PIR and multiple hardware configurations. [#65763](https://github.com/PaddlePaddle/Paddle/pull/65763),[#66481](https://github.com/PaddlePaddle/Paddle/pull/66481),[#67105](https://github.com/PaddlePaddle/Paddle/pull/67105),[#67248](https://github.com/PaddlePaddle/Paddle/pull/67248),[#67470](https://github.com/PaddlePaddle/Paddle/pull/67470),[#67638](https://github.com/PaddlePaddle/Paddle/pull/67638),[#68135](https://github.com/PaddlePaddle/Paddle/pull/68135),[#68191](https://github.com/PaddlePaddle/Paddle/pull/68191),[#68211](https://github.com/PaddlePaddle/Paddle/pull/68211),[#68160](https://github.com/PaddlePaddle/Paddle/pull/68160),[#68185](https://github.com/PaddlePaddle/Paddle/pull/68185),[#68127](https://github.com/PaddlePaddle/Paddle/pull/68127),[#68887](https://github.com/PaddlePaddle/Paddle/pull/68887),[#69191](https://github.com/PaddlePaddle/Paddle/pull/69191), [#70961](https://github.com/PaddlePaddle/Paddle/pull/70961),[#68020](https://github.com/PaddlePaddle/Paddle/pull/68020),[#67923](https://github.com/PaddlePaddle/Paddle/pull/67923),[#67963](https://github.com/PaddlePaddle/Paddle/pull/67963),[#68482](https://github.com/PaddlePaddle/Paddle/pull/68482),[#68546](https://github.com/PaddlePaddle/Paddle/pull/68546),[#68593](https://github.com/PaddlePaddle/Paddle/pull/68593),[#68793](https://github.com/PaddlePaddle/Paddle/pull/68793) +- Fixed issues related to Paddle TensorRT conversion and execution. [#66932](https://github.com/PaddlePaddle/Paddle/pull/66932),[#66655](https://github.com/PaddlePaddle/Paddle/pull/66655),[#67274](https://github.com/PaddlePaddle/Paddle/pull/67274),[#67504](https://github.com/PaddlePaddle/Paddle/pull/67504),[#65780](https://github.com/PaddlePaddle/Paddle/pull/65780),[#68170](https://github.com/PaddlePaddle/Paddle/pull/68170),[#68647](https://github.com/PaddlePaddle/Paddle/pull/68647),[#68776](https://github.com/PaddlePaddle/Paddle/pull/68776),[#69573](https://github.com/PaddlePaddle/Paddle/pull/69573),[#69598](https://github.com/PaddlePaddle/Paddle/pull/69598),[#69510](https://github.com/PaddlePaddle/Paddle/pull/69510),[#69864](https://github.com/PaddlePaddle/Paddle/pull/69864),[#69885](https://github.com/PaddlePaddle/Paddle/pull/69885),[#70161](https://github.com/PaddlePaddle/Paddle/pull/70161),[#70116](https://github.com/PaddlePaddle/Paddle/pull/70116),[#70791](https://github.com/PaddlePaddle/Paddle/pull/70791),[#70801](https://github.com/PaddlePaddle/Paddle/pull/70801),[#70824](https://github.com/PaddlePaddle/Paddle/pull/70824),[#70939](https://github.com/PaddlePaddle/Paddle/pull/70939), [#71143](https://github.com/PaddlePaddle/Paddle/pull/71143),[#71154](https://github.com/PaddlePaddle/Paddle/pull/71154),[#71163](https://github.com/PaddlePaddle/Paddle/pull/71163),[#71183](https://github.com/PaddlePaddle/Paddle/pull/71183),[#71233](https://github.com/PaddlePaddle/Paddle/pull/71233),[#71287](https://github.com/PaddlePaddle/Paddle/pull/71287),[#71319](https://github.com/PaddlePaddle/Paddle/pull/71319),[#67720](https://github.com/PaddlePaddle/Paddle/pull/67720),[#69671](https://github.com/PaddlePaddle/Paddle/pull/69671),[#70168](https://github.com/PaddlePaddle/Paddle/pull/70168),[#69957](https://github.com/PaddlePaddle/Paddle/pull/69957) +- Fixed issues related to Paddle Inference compilation and linking. [#65846](https://github.com/PaddlePaddle/Paddle/pull/65846), [#67081](https://github.com/PaddlePaddle/Paddle/pull/67081), [#63184](https://github.com/PaddlePaddle/Paddle/pull/63184) +- Fixed quantization issues. [#67839](https://github.com/PaddlePaddle/Paddle/pull/67839), [#68049](https://github.com/PaddlePaddle/Paddle/pull/68049), [#70099](https://github.com/PaddlePaddle/Paddle/pull/70099), [#64878](https://github.com/PaddlePaddle/Paddle/pull/64878), [#65717](https://github.com/PaddlePaddle/Paddle/pull/65717), [#67552](https://github.com/PaddlePaddle/Paddle/pull/67552), [#67715](https://github.com/PaddlePaddle/Paddle/pull/67715) +- Fixed OneDNN inference issues. [#67836](https://github.com/PaddlePaddle/Paddle/pull/67836), [#68021](https://github.com/PaddlePaddle/Paddle/pull/68021), [#68132](https://github.com/PaddlePaddle/Paddle/pull/68132), [#71426](https://github.com/PaddlePaddle/Paddle/pull/71426), [#68057](https://github.com/PaddlePaddle/Paddle/pull/68057) +- Fixed memory issues. [#68631](https://github.com/PaddlePaddle/Paddle/pull/68631), [#69129](https://github.com/PaddlePaddle/Paddle/pull/69129), [#70314](https://github.com/PaddlePaddle/Paddle/pull/70314), [#67863](https://github.com/PaddlePaddle/Paddle/pull/67863) +- Paddle Inference supports bug fixes for OpenVINO issues. [#70212](https://github.com/PaddlePaddle/Paddle/pull/70212), [#70288](https://github.com/PaddlePaddle/Paddle/pull/70288), +- Fixed issues related to Pass. [#65349](https://github.com/PaddlePaddle/Paddle/pull/65349),[#65421](https://github.com/PaddlePaddle/Paddle/pull/65421),[#65677](https://github.com/PaddlePaddle/Paddle/pull/65677),[#66850](https://github.com/PaddlePaddle/Paddle/pull/66850),[#67443](https://github.com/PaddlePaddle/Paddle/pull/67443),[#67620](https://github.com/PaddlePaddle/Paddle/pull/67620),[#68158](https://github.com/PaddlePaddle/Paddle/pull/68158),[#68642](https://github.com/PaddlePaddle/Paddle/pull/68642),[#68837](https://github.com/PaddlePaddle/Paddle/pull/68837),[#68880](https://github.com/PaddlePaddle/Paddle/pull/68880),[#68935](https://github.com/PaddlePaddle/Paddle/pull/68935),[#69112](https://github.com/PaddlePaddle/Paddle/pull/69112),[#69205](https://github.com/PaddlePaddle/Paddle/pull/69205),[#69242](https://github.com/PaddlePaddle/Paddle/pull/69242),[#69352](https://github.com/PaddlePaddle/Paddle/pull/69352),[#69421](https://github.com/PaddlePaddle/Paddle/pull/69421),[#69690](https://github.com/PaddlePaddle/Paddle/pull/69690), +- Fixed other issues. [#70237](https://github.com/PaddlePaddle/Paddle/pull/70237), [#68173](https://github.com/PaddlePaddle/Paddle/pull/68173) +- Fixed issues related to fused_moe (testing/GEMM/WINT4/multi-architecture compatibility/Bias optional) [#67353](https://github.com/PaddlePaddle/Paddle/pull/67353), [#67396](https://github.com/PaddlePaddle/Paddle/pull/67396), [#67717](https://github.com/PaddlePaddle/Paddle/pull/67717), [#67794](https://github.com/PaddlePaddle/Paddle/pull/67794), [#67783](https://github.com/PaddlePaddle/Paddle/pull/67783) +- Fixed issues in the block_attention series (GQA discrepancy/out-of-bounds risk/multi-head support) [#67175](https://github.com/PaddlePaddle/Paddle/pull/67175), [#69001](https://github.com/PaddlePaddle/Paddle/pull/69001), [#70763](https://github.com/PaddlePaddle/Paddle/pull/70763) +- Fixed PIR-related issues (layout conversion/BF16 replacement errors) [#66977](https://github.com/PaddlePaddle/Paddle/pull/66977), [#67830](https://github.com/PaddlePaddle/Paddle/pull/67830) +- Fixed distributed-related issues (allreduce data type/parameter synchronization) [#67449](https://github.com/PaddlePaddle/Paddle/pull/67449), [#69157](https://github.com/PaddlePaddle/Paddle/pull/69157) +- Fixed kernel execution issues (forward-backward conflict/default stream argsort) [#67218](https://github.com/PaddlePaddle/Paddle/pull/67218), [#68374](https://github.com/PaddlePaddle/Paddle/pull/68374) +- Other key fixes (reducing the size of the C++ library/fixing RoPE calculation in NeoX format/fixing static graph execution) [#66041](https://github.com/PaddlePaddle/Paddle/pull/66041), [#66583](https://github.com/PaddlePaddle/Paddle/pull/66583), [#67580](https://github.com/PaddlePaddle/Paddle/pull/67580) + +### Other modifications + +- Code cleanup and maintenance (API deprecation/compilation warning fixes) [#68048](https://github.com/PaddlePaddle/Paddle/pull/68048), [#70384](https://github.com/PaddlePaddle/Paddle/pull/70384) +- Third-party integration optimization (OpenVINO submodule management) [#70313](https://github.com/PaddlePaddle/Paddle/pull/70313), [#70425](https://github.com/PaddlePaddle/Paddle/pull/70425) + +## 8. Hardware adaptation + +Continuously improve and upgrade the functions of platforms such as Kunlun and Haiguang to enhance user experience -### Bug Fixing +### New Features -- For the problem of reading and writing out of bounds of the unique operator, fix the problem of setting the wrong length in the calculation process of the unique operator to ensure the correctness of the operation of the unique operator. [#60840](https://github.com/PaddlePaddle/Paddle/pull/60840) -- Fixed some bugs in PGLBox save/load and compilation process to ensure the correctness of PGLBox function in response to the lack of save/load function and compilation error in PGLBox training process. [#63905](https://github.com/PaddlePaddle/Paddle/pull/63905) -- Fix the setting value of use_ps_gpu in CPUPS to ensure the correctness of the CPUPS training process, in response to the problem that the CPUPS training process triggers the GPUPS logic and causes the training to crash. [#61406](https://github.com/PaddlePaddle/Paddle/pull/61406) -- For the problem that the cudaErrorInvalidResourceHandle error occurs in GPUPS training in CUDA 12.3, add the device id switching mechanism, to ensure that the corresponding resource operation is carried out on the correct device. [#63391](https://github.com/PaddlePaddle/Paddle/pull/63391) -- For the problem of garbled codes in PGLBox Embedding Dump process, fix the bug of improper use of C++ std::string, to ensure the correctness of Embedding Dump results. [#65179](https://github.com/PaddlePaddle/Paddle/pull/65179) +The addition of operations (ops) and improvement of functions on Kunlun Core XPU involve the following ops: flash attention/flash_attn_unpadded, multinomial, matmul, repeat_interleave, logsumexp, index_put_grad, mean_grad, pow, pow_grad, rsqrt, full, rms_norm, rms_norm_grad, put_along_axis, Cumsum, argmin, masked_select/grad, expand_v2/grad, all2all, expand, reduce_sum, reduce_max, reduce_min, moe, fused_linear_param_grad_add, adamw, clip/clip_grad, tan, acos, blha_get_max_len, gather/gather_grad, scatter/scatter_grad, round, index_select/sindex_select_grad, isfinite, isinf, quantize_linear, dequantize_linear, conv3d_transpose, logsumexp_grad, index_add_grad, eye, gather_element, tril, triu, set_value_grad, argmax, take_along_axis, etc +[#65413](https://github.com/PaddlePaddle/Paddle/pull/65413), [#64846](https://github.com/PaddlePaddle/Paddle/pull/64846), [#65656](https://github.com/PaddlePaddle/Paddle/pull/65656), [#65963](https://github.com/PaddlePaddle/Paddle/pull/65963), [#66143](https://github.com/PaddlePaddle/Paddle/pull/66143), [#66482](https://github.com/PaddlePaddle/Paddle/pull/66482), [#66585](https://github.com/PaddlePaddle/Paddle/pull/66585), [#67077](https://github.com/PaddlePaddle/Paddle/pull/67077), [#67173](https://github.com/PaddlePaddle/Paddle/pull/67173), [#67551](https://github.com/PaddlePaddle/Paddle/pull/67551), [#63989](https://github.com/PaddlePaddle/Paddle/pull/63989), [#67919](https://github.com/PaddlePaddle/Paddle/pull/67919), [#68052](https://github.com/PaddlePaddle/Paddle/pull/68052), [#68176](https://github.com/PaddlePaddle/Paddle/pull/68176), [#68408](https://github.com/PaddlePaddle/Paddle/pull/68408), [#68454](https://github.com/PaddlePaddle/Paddle/pull/68454), [#68478](https://github.com/PaddlePaddle/Paddle/pull/68478), [#68473](https://github.com/PaddlePaddle/Paddle/pull/68473), [#68453](https://github.com/PaddlePaddle/Paddle/pull/68453), [#68770](https://github.com/PaddlePaddle/Paddle/pull/68770), [#68933](https://github.com/PaddlePaddle/Paddle/pull/68933), [#69042](https://github.com/PaddlePaddle/Paddle/pull/69042), [#68713](https://github.com/PaddlePaddle/Paddle/pull/68713), [#69368](https://github.com/PaddlePaddle/Paddle/pull/69368), [#69723](https://github.com/PaddlePaddle/Paddle/pull/69723), [#69767](https://github.com/PaddlePaddle/Paddle/pull/69767), [#69898](https://github.com/PaddlePaddle/Paddle/pull/69898), [#69970](https://github.com/PaddlePaddle/Paddle/pull/69970), [#69771](https://github.com/PaddlePaddle/Paddle/pull/69771), [#70176](https://github.com/PaddlePaddle/Paddle/pull/70176), [#70428](https://github.com/PaddlePaddle/Paddle/pull/70428), [#70573](https://github.com/PaddlePaddle/Paddle/pull/70573), [#70576](https://github.com/PaddlePaddle/Paddle/pull/70576), [#70633](https://github.com/PaddlePaddle/Paddle/pull/70633), [#70114](https://github.com/PaddlePaddle/Paddle/pull/70114), [#70627](https://github.com/PaddlePaddle/Paddle/pull/70627), [#71038](https://github.com/PaddlePaddle/Paddle/pull/71038), [#71132](https://github.com/PaddlePaddle/Paddle/pull/71132), [#71228](https://github.com/PaddlePaddle/Paddle/pull/71228), [#71274](https://github.com/PaddlePaddle/Paddle/pull/71274), [#71364](https://github.com/PaddlePaddle/Paddle/pull/71364), [#71375](https://github.com/PaddlePaddle/Paddle/pull/71375), [#71431](https://github.com/PaddlePaddle/Paddle/pull/71431), [#71451](https://github.com/PaddlePaddle/Paddle/pull/71451), [#67585](https://github.com/PaddlePaddle/Paddle/pull/67585), [#67637](https://github.com/PaddlePaddle/Paddle/pull/67637), [#67914](https://github.com/PaddlePaddle/Paddle/pull/67914), [#67641](https://github.com/PaddlePaddle/Paddle/pull/67641), [#67913](https://github.com/PaddlePaddle/Paddle/pull/67913), [#67955](https://github.com/PaddlePaddle/Paddle/pull/67955), [#68411](https://github.com/PaddlePaddle/Paddle/pull/68411), [#68560](https://github.com/PaddlePaddle/Paddle/pull/68560), [#68423](https://github.com/PaddlePaddle/Paddle/pull/68423), [#68894](https://github.com/PaddlePaddle/Paddle/pull/68894), [#71053](https://github.com/PaddlePaddle/Paddle/pull/71053), [#71047](https://github.com/PaddlePaddle/Paddle/pull/71047), [#69056](https://github.com/PaddlePaddle/Paddle/pull/69056), [#70843](https://github.com/PaddlePaddle/Paddle/pull/70843), [#65653](https://github.com/PaddlePaddle/Paddle/pull/65653), [#68023](https://github.com/PaddlePaddle/Paddle/pull/68023), [#67780](https://github.com/PaddlePaddle/Paddle/pull/67780), [#68622](https://github.com/PaddlePaddle/Paddle/pull/68622), [#67215](https://github.com/PaddlePaddle/Paddle/pull/67215) -### Documentation Improvement +Add support for rocsolver and warpctc on Haiguang DCU, and carry out the addition of OPs and improvement of functions. The involved ops include: flash_attention, hipblaslt, fastgelu, multiclass_nms3 -- Access security warnings in the RPC interface documentation, to remind users that they need to use this interface under secure network conditions. [#64100](https://github.com/PaddlePaddle/Paddle/pull/64100) +[#68066](https://github.com/PaddlePaddle/Paddle/pull/68066), [#69457](https://github.com/PaddlePaddle/Paddle/pull/69457), [#68603](https://github.com/PaddlePaddle/Paddle/pull/68603), [#65599](https://github.com/PaddlePaddle/Paddle/pull/65599), [#70587](https://github.com/PaddlePaddle/Paddle/pull/70587), [#71337](https://github.com/PaddlePaddle/Paddle/pull/71337), [#70173](https://github.com/PaddlePaddle/Paddle/pull/70173) -### Security Enhancement +### Bug fixes -- Fix several code security issues to prevent malicious code injection. [#60023](https://github.com/PaddlePaddle/Paddle/pull/60023),[#60544](https://github.com/PaddlePaddle/Paddle/pull/60544),[#60615](https://github.com/PaddlePaddle/Paddle/pull/60615) +Bug fix for OP on Kunlun Core XPU +[#65020](https://github.com/PaddlePaddle/Paddle/pull/65020), [#65251](https://github.com/PaddlePaddle/Paddle/pull/65251), [#65418](https://github.com/PaddlePaddle/Paddle/pull/65418), [#65387](https://github.com/PaddlePaddle/Paddle/pull/65387), [#65525](https://github.com/PaddlePaddle/Paddle/pull/65525), [#65613](https://github.com/PaddlePaddle/Paddle/pull/65613), [#65533](https://github.com/PaddlePaddle/Paddle/pull/65533), [#65705](https://github.com/PaddlePaddle/Paddle/pull/65705), [#65915](https://github.com/PaddlePaddle/Paddle/pull/65915), [#66238](https://github.com/PaddlePaddle/Paddle/pull/66238), [#66485](https://github.com/PaddlePaddle/Paddle/pull/66485), [#67349](https://github.com/PaddlePaddle/Paddle/pull/67349), [#67372](https://github.com/PaddlePaddle/Paddle/pull/67372), [#67276](https://github.com/PaddlePaddle/Paddle/pull/67276), [#67460](https://github.com/PaddlePaddle/Paddle/pull/67460), [#67496](https://github.com/PaddlePaddle/Paddle/pull/67496), [#67530](https://github.com/PaddlePaddle/Paddle/pull/67530), [#67828](https://github.com/PaddlePaddle/Paddle/pull/67828), [#68010](https://github.com/PaddlePaddle/Paddle/pull/68010), [#68157](https://github.com/PaddlePaddle/Paddle/pull/68157), [#68172](https://github.com/PaddlePaddle/Paddle/pull/68172), [#68388](https://github.com/PaddlePaddle/Paddle/pull/68388), [#68213](https://github.com/PaddlePaddle/Paddle/pull/68213), [#68501](https://github.com/PaddlePaddle/Paddle/pull/68501), [#68504](https://github.com/PaddlePaddle/Paddle/pull/68504), [#68585](https://github.com/PaddlePaddle/Paddle/pull/68585), [#69229](https://github.com/PaddlePaddle/Paddle/pull/69229), [#69374](https://github.com/PaddlePaddle/Paddle/pull/69374), [#69424](https://github.com/PaddlePaddle/Paddle/pull/69424), [#69440](https://github.com/PaddlePaddle/Paddle/pull/69440), [#69614](https://github.com/PaddlePaddle/Paddle/pull/69614), [#68542](https://github.com/PaddlePaddle/Paddle/pull/68542), [#69990](https://github.com/PaddlePaddle/Paddle/pull/69990), [#70351](https://github.com/PaddlePaddle/Paddle/pull/70351), [#70479](https://github.com/PaddlePaddle/Paddle/pull/70479), [#70431](https://github.com/PaddlePaddle/Paddle/pull/70431), [#70638](https://github.com/PaddlePaddle/Paddle/pull/70638), [#70856](https://github.com/PaddlePaddle/Paddle/pull/70856), [#70974](https://github.com/PaddlePaddle/Paddle/pull/70974), [#70973](https://github.com/PaddlePaddle/Paddle/pull/70973), [#71027](https://github.com/PaddlePaddle/Paddle/pull/71027), [#71062](https://github.com/PaddlePaddle/Paddle/pull/71062), [#71115](https://github.com/PaddlePaddle/Paddle/pull/71115), [#71110](https://github.com/PaddlePaddle/Paddle/pull/71110), [#70858](https://github.com/PaddlePaddle/Paddle/pull/70858), [#71147](https://github.com/PaddlePaddle/Paddle/pull/71147), [#71212](https://github.com/PaddlePaddle/Paddle/pull/71212), [#71361](https://github.com/PaddlePaddle/Paddle/pull/71361), [#71423](https://github.com/PaddlePaddle/Paddle/pull/71423), [#70859](https://github.com/PaddlePaddle/Paddle/pull/70859), [#71492](https://github.com/PaddlePaddle/Paddle/pull/71492), [#71493](https://github.com/PaddlePaddle/Paddle/pull/71493), [#69826](https://github.com/PaddlePaddle/Paddle/pull/69826), [#67341](https://github.com/PaddlePaddle/Paddle/pull/67341), [#68906](https://github.com/PaddlePaddle/Paddle/pull/68906), [#71171](https://github.com/PaddlePaddle/Paddle/pull/71171) -## Inference Deployment +Bug fix for OP on Haiguang DCU +[#69617](https://github.com/PaddlePaddle/Paddle/pull/69617), [#65716](https://github.com/PaddlePaddle/Paddle/pull/65716), [#66630](https://github.com/PaddlePaddle/Paddle/pull/66630), [#65399](https://github.com/PaddlePaddle/Paddle/pull/65399) -The inference framework is based on PIR upgraded PASS under GPU, XPU, CPU hardware, to significantly reduce the number of lines of codes compared with the previous version, and improve development efficiency. The underlying executor is upgraded to a new version of asynchronous executor, improving inference performance on most models. Complete the adaptive interconnection for inference acceleration based on CINN compiler. Add the switches for these features. Users can turn on the features through settings. In addition, Paddle Inference supports direct loading of optimized serialized models under mixed inference with TensorRT subgraphs natively, to reduce startup time consumption. For Paddle-TensorRT, add the interfaces to flexibly control node computation precision and whether the subgraph enters TensorRT computation. It is convenient for debugging. For performance optimization, GPU, XPU, CPU are added with more Transformer and LLM computing acceleration fusion operator, such as group attention mechanism fusion operator, GQA structure, and WINT4, and support for automatic matching by PASS. +### Performance optimization -### New Features +Kunlun Core XPU upgrades the functions of basic components such as streams and optimizes the performance of certain operations. +[#65102](https://github.com/PaddlePaddle/Paddle/pull/65102), [#69727](https://github.com/PaddlePaddle/Paddle/pull/69727), [#69899](https://github.com/PaddlePaddle/Paddle/pull/69899), [#69942](https://github.com/PaddlePaddle/Paddle/pull/69942), [#70025](https://github.com/PaddlePaddle/Paddle/pull/70025), [#70640](https://github.com/PaddlePaddle/Paddle/pull/70640) -- Paddle-TensorRT - - The API called at the underlying of Paddle-TensorRT is upgraded. When the version of TensorRT is later than 8.5, the EnqueueV2 API called (which will be deprecated in the future) is upgraded to the EnqueueV3 API. [#60807](https://github.com/PaddlePaddle/Paddle/pull/60807) - - Add the config.exp_disable_tensorrt_subgraph() to set some subgraphs not to enter TensorRT. [#61967](https://github.com/PaddlePaddle/Paddle/pull/61967) - - Add the config.exp_disable_tensorrt_dynamic_shape_ops() to set dynamic shape input operators not to enter TensorRT. The default value is False. [#62352](https://github.com/PaddlePaddle/Paddle/pull/62352) - - Add the config.exp_specify_tensorrt_subgraph_precision() to set nodes to run different precision types. [#62402](https://github.com/PaddlePaddle/Paddle/pull/62402) -- In the Inference, add switch to turn on CINN compiler. When configuring inference config, turn on CINN through config.enable_cinn(). [#61949](https://github.com/PaddlePaddle/Paddle/pull/61949) -- PIR use mechanism in the Inference upgrade - - In the config, add enable_new_ir() interface to enable PIR. [#61968](https://github.com/PaddlePaddle/Paddle/pull/61968) - - In the config, add set_optimization_level() interface to set different optimization levels. [#61968](https://github.com/PaddlePaddle/Paddle/pull/61968) - - In the PIR mechanism, the PASS function supports custom C++PASS. [#62468](https://github.com/PaddlePaddle/Paddle/pull/62468) - - The inference library exposes PIR-related implementation header files to the outside world. Support users' secondary development based on PIR, such as custom Pass development. [#61863](https://github.com/PaddlePaddle/Paddle/pull/61863),[#62293](https://github.com/PaddlePaddle/Paddle/pull/62293) - - The PIR mechanism supports input and output of the Hook operator by registering the Predictor. [#63101](https://github.com/PaddlePaddle/Paddle/pull/63101) -- The multi-layer Transformer fusion operator fused_multi_transformer_op supports GQA calculation. [#64125](https://github.com/PaddlePaddle/Paddle/pull/64125) - -### Function Improvements - -- The inference supports loading optimized models directly, making it possible to skip IR optimization altogether. The deployment in this way can minimize framework overhead. [#61598](https://github.com/PaddlePaddle/Paddle/pull/61598) -- Re-specify the shape range information file when loading the saved IR PASS optimized model inference. [#60457](https://github.com/PaddlePaddle/Paddle/pull/60457) -- Collect the Shape information within the subgraph of the control flow operator, supporting the use of Paddle-TensorRT inference acceleration. [#60451](https://github.com/PaddlePaddle/Paddle/pull/60451) ,[#59588](https://github.com/PaddlePaddle/Paddle/pull/59588) -- The mixed-precision PASS (auto_mixed_precision_pass) for GPU-native inference supports the handling of sparse Tensor. [#62656](https://github.com/PaddlePaddle/Paddle/pull/62656) -- XPU hardware related function - - XPU's fused PASS for Conv and FC supports conversion from Float to INT31 type. [#59981](https://github.com/PaddlePaddle/Paddle/pull/59981) - - XPU's strided slice operator supports the setting of strides non-negative. [#62268](https://github.com/PaddlePaddle/Paddle/pull/62268) - - XPU's multi-layer Encoder fusion PASS is adaptive to sequence length and supports variable length. [#63825](https://github.com/PaddlePaddle/Paddle/pull/63825) -- Paddle TensorRT INT8 computation mode supports tile operator into TensorRT computation, to improve INT8 performance of some models. [#60189](https://github.com/PaddlePaddle/Paddle/pull/60189) - -### Model Compression - -Fix bugs and optimize functions mainly for Post Training Quantization (PTQ) and Quantization Aware Training (QAT). - -- Support the simulation quantization grouped by channel. [#61828](https://github.com/PaddlePaddle/Paddle/pull/61828) -- Support automatic saving of quantization scale to model parameter file under dynamic graphs. [#59441](https://github.com/PaddlePaddle/Paddle/pull/59441) -- Remove the restriction that the dataloader must be a DataLoader instance. [#61798](https://github.com/PaddlePaddle/Paddle/pull/61798) - -### Performance Optimization - -- Upgrade the inference executor to reduce the video memory usage at runtime while keeping the performance unchanged. This can be used through config.enable_use_executor(True). [#57920](https://github.com/PaddlePaddle/Paddle/pull/57920),[#58452](https://github.com/PaddlePaddle/Paddle/pull/58452),[#63350](https://github.com/PaddlePaddle/Paddle/pull/63350),[#64466](https://github.com/PaddlePaddle/Paddle/pull/64466) -- Upgrade oneDNN version of paddle inference to v3.4. Its overall performance has been improved compared with v3.3. [#64661](https://github.com/PaddlePaddle/Paddle/pull/64661) -- Upgrade the CUTLASS-based support for matrix multiplication and activation fusion calculation. ([#61925](https://github.com/PaddlePaddle/Paddle/pull/61925)) - -#### Add generic PASS in PIR mechanism - -- Add identity_op_clean_pass and matmul_scale_fuse_pass. [#59840](https://github.com/PaddlePaddle/Paddle/pull/59840) -- Add fused_flash_attn_pass. The pass can call flash_attention to replace the original attentions computation. [#64213](https://github.com/PaddlePaddle/Paddle/pull/64213),[#64707](https://github.com/PaddlePaddle/Paddle/pull/64707),[#63304](https://github.com/PaddlePaddle/Paddle/pull/63304) -- In the inference PIR new architecture, upgrade layout adjustment algorithm, support the NHWC inference of conv class and norm class. The performance tested on SD models is significantly improved. [#63628](https://github.com/PaddlePaddle/Paddle/pull/63628),[#64634](https://github.com/PaddlePaddle/Paddle/pull/64634),[#64658](https://github.com/PaddlePaddle/Paddle/pull/64658),[#64708](https://github.com/PaddlePaddle/Paddle/pull/64708),[#64830](https://github.com/PaddlePaddle/Paddle/pull/64830),[#64896](https://github.com/PaddlePaddle/Paddle/pull/64896) -- Add remove_redundant_transpose PASS. [#63357](https://github.com/PaddlePaddle/Paddle/pull/63357) -- Enable CSE PASS in inference to improve inference performance. [#64523](https://github.com/PaddlePaddle/Paddle/pull/64523) - -#### GPU Performance Optimizations - -Include new fusion operators and new PASS under PIR mechanism. - -- Optimize the performance of sparse convolution operator (sparse conv) to improve the inference performance of BEV and other models. [#63067](https://github.com/PaddlePaddle/Paddle/pull/63067) -- Add the fusion PASS based on flash attention. [#63220](https://github.com/PaddlePaddle/Paddle/pull/63220) -- The inference supports elementwise_add+group_norm+silu activated operator fusion pattern and its corresponding fusion kernel. [#64199](https://github.com/PaddlePaddle/Paddle/pull/64199) -- The Matrix multiplication calculation supports groupwise's Weight only INT4 calculation. [#60422](https://github.com/PaddlePaddle/Paddle/pull/60422) 、[#63212](https://github.com/PaddlePaddle/Paddle/pull/63212) 、[#60204](https://github.com/PaddlePaddle/Paddle/pull/60204)) -- The implementation of the group attention mechanism fusion operator block_multi_head_attention supports KV Cache quantization. [#59951](https://github.com/PaddlePaddle/Paddle/pull/59951)) -- The Inference uses CUTLASS upgraded conv fusion operator to implement and support PASS automatic fusion. Support bias and activation. Compared to the original cuDNN, the new operator has significant performance acceleration. It is used through config.exp_enable_use_cutlass(True). [#64201](https://github.com/PaddlePaddle/Paddle/pull/64201)、[#64641](https://github.com/PaddlePaddle/Paddle/pull/64641) -- Add the blha_get_max_len operator and remove every call to get_max_len in block_multihead_attention. The function application is used for large model dynamic inference acceleration. [#64246](https://github.com/PaddlePaddle/Paddle/pull/64246) -- Data layout optimization: PASS prohibits using NHWC mode calculation in the conv fusion operator FP32 precision type, because cuDNN will cause performance degradation under this condition. [#63400](https://github.com/PaddlePaddle/Paddle/pull/63400) -- GPU peak video memory optimization: upgrade the underlying interface TryShrinkMemory, and upgrade to support GPU place under the support for the release of the idle video memory in the pool. In certain scenarios, peak video memory can be significantly cut. [#61319](https://github.com/PaddlePaddle/Paddle/pull/61319) - -#### CPU performance optimization - -Include new fusion operator. Add PASS under PIR mechanism and optimize part of Kernel. - -- Add scale_matmul_fuse_pass. [#63313](https://github.com/PaddlePaddle/Paddle/pull/63313) -- Add CPU implementation in fused_bias_residual_layernorm and fused_rms_norm to improve inference speed. [#63196](https://github.com/PaddlePaddle/Paddle/pull/63196)、[#63165](https://github.com/PaddlePaddle/Paddle/pull/63165) -- Add the cache optimization for Deconvolution kernel, to greatly improve the execution speed of this operator. [#60922](https://github.com/PaddlePaddle/Paddle/pull/60922) -- In PIR, add depthwise_conv fusion PASS, to convert the depthwise_conv operator to conv2d, thus using the onednn conv2d kernel optimization to improve the inference speed of this operator. [#63051](https://github.com/PaddlePaddle/Paddle/pull/63051) -- In PIR, add Conv and Activation Fusion PASS (conv_activation_mkldnn_fuse_pass), to support the fusion of conv and 13 kinds of activation functions, thus greatly improving the inference speed of conv-related operators. [#63145](https://github.com/PaddlePaddle/Paddle/pull/63145) -- In PIR, add the fusion PASS (operator_unsqueeze_onednn_fuse_pass) between multiple operators and unsqueeze, to improve inference speed. [#63592](https://github.com/PaddlePaddle/Paddle/pull/63592) -- In PIR, add PASS (operator_reshape_onednn_fuse_pass) to fuse reshape into multiple operators. [#63812](https://github.com/PaddlePaddle/Paddle/pull/63812) -- In PIR, add scale fusion PASS (operator_scale_onednn_fuse_pass). [#63811](https://github.com/PaddlePaddle/Paddle/pull/63811) -- In PIR, add PASS (conv2d_transpose_bias operator) that fuses conv and bias. [#62241](https://github.com/PaddlePaddle/Paddle/pull/62241) -- In PIR, add onednn_placement_pass, which supports 151 operators to convert from Phi operators to oneDNN operators, so that the oneDNN high-performance library can be used for optimization, to improve the inference speed. [#63982](https://github.com/PaddlePaddle/Paddle/pull/63982) -- In PIR, add the fusion between Elementwise type operators and 13 activation functions, to greatly improve the inference speed of enabling Onednn on the CPU. [#63516](https://github.com/PaddlePaddle/Paddle/pull/63516) -- In PIR, add the fusion of multiple conv + concat + activation functions and fused_conv + concat + activation functions, to greatly improve the inference speed when there are concat and activation functions in conv. [#62993](https://github.com/PaddlePaddle/Paddle/pull/62993)、 [#62713](https://github.com/PaddlePaddle/Paddle/pull/62713) -- In PIR, add matmul+add operator fusion PASS (matmul_elementwise_add_fuse_pass). [#62715](https://github.com/PaddlePaddle/Paddle/pull/62715) -- In PIR, add the scale parameter to fold PASS (scale_matmul_fuse_pass). [#63313](https://github.com/PaddlePaddle/Paddle/pull/63313) -- In PIR, add the fusion PASS (softplus_activation_fuse_pass) between softplus and 12 activation functions. [#63617](https://github.com/PaddlePaddle/Paddle/pull/63617) -- In PIR, add fc operator conversion PASS (fc_onednn_enable_pass). [#63518](https://github.com/PaddlePaddle/Paddle/pull/63518) -- In PIR, add self-attention operator fusion PASS (self_attention_fuse_pass). [#63726](https://github.com/PaddlePaddle/Paddle/pull/63726) -- In PIR, add fusion PASS (fc_activation_fuse_pass) between fc and 12 activation functions. [#63853](https://github.com/PaddlePaddle/Paddle/pull/63853) -- In PIR, add BatchNorm folded PASS (conv2d_bn_onednn_fuse_pass) to amplify the fusion probability of subsequent PASS. [#64524](https://github.com/PaddlePaddle/Paddle/pull/64524) -- In PIR, add the fusion PASS (matmul_activation_fuse_pass) between matmul and 12 activation functions. [#62901](https://github.com/PaddlePaddle/Paddle/pull/62901) -- In PIR, add reshape + transpose + reshape fusion PASS (shuffle_channel_detect_pass), which is fused into a shuffle_channel operator under specific conditions. [#64053](https://github.com/PaddlePaddle/Paddle/pull/64053) -- In PIR, add reshape + transpose + matmul fusion PASS (reshape_transpose_matmul_fuse_pass). [#62998](https://github.com/PaddlePaddle/Paddle/pull/62998) -- In PIR, add matmul + transpose + reshape fusion PASS (matmul_transpose_reshape_fuse_pass) to PIR to significantly improve performance in some scenarios. [#63151](https://github.com/PaddlePaddle/Paddle/pull/63151)(https://github.com/PaddlePaddle/Paddle/pull/63151) -- XPU hardware new fusion PASS optimization: - - Add qk_qkv_attention_xpu_fuse_pass and qkv_attention_xpu_kernel in XPU hardware. [#60089](https://github.com/PaddlePaddle/Paddle/pull/60089) - - Add rotary position encoded fusion operator, to support elementwise_mul + strided_slice + sin/cos+ stack fusion to 1 operator in XPU hardware. [#60025](https://github.com/PaddlePaddle/Paddle/pull/60025) - - Add group_norm_silu_xpu_fuse_pass. [#62689](https://github.com/PaddlePaddle/Paddle/pull/62689) - - Add weight_only_linear_xpu_pass. [#64185](https://github.com/PaddlePaddle/Paddle/pull/64185) - - Add block_multihead_attention operator and PASS, to support large model inference for LLaMA2 models in XPU devices. [#65036](https://github.com/PaddlePaddle/Paddle/pull/65036) - - Support float16 type for squeeze_excitation_block_xpu_kernel. [#61023](https://github.com/PaddlePaddle/Paddle/pull/61023) - -### Bug Fixing - -- Fix mixed-precision conversions in models such as faster_rcnn_swin_tiny_fpn_1x_coco, and solve the mixed_precision_pass error. [#64673](https://github.com/PaddlePaddle/Paddle/pull/64673) -- Block fused_conv2d_add_act pass from being validated in activation functions that are sigmoid (fused conv2d and sigmoid cause performance degradation between cudnn versions 8.0 and 8.7). [#64717](https://github.com/PaddlePaddle/Paddle/pull/64717) -- Fix compilation issues with self_dp_attention and fused_layer_norm_avx_kernel in Clang12. [#63414](https://github.com/PaddlePaddle/Paddle/pull/63414) -- Fix the issue that scale and zeroPoints in the qdq operator of some models are deleted prematurely in the IR/Pass stage. [#62225](https://github.com/PaddlePaddle/Paddle/pull/62225) -- Fix the issue that causes an error to be reported when both Config.UseOptimizedModel() and config.EnableMemoryOptim() are turned on. [#62501](https://github.com/PaddlePaddle/Paddle/pull/62501) -- Add constraint on matmul_scale_fuse_pass, where input w must be a weight or the pass will not be matched. [#62850](https://github.com/PaddlePaddle/Paddle/pull/62850) -- Keep inference model output key ordering guaranteed to be the same as when dynamic graph models are exported. [#63791](https://github.com/PaddlePaddle/Paddle/pull/63791) -- Fix the error in subgraph when the constant fold PASS is in "the folded op and its input and output are not in the same subgraph." [#62148](https://github.com/PaddlePaddle/Paddle/pull/62148) -- Fix several runtime problems in PaddleTRT mode. Include the failure of quantization calibration table generation caused by yolo_box operator in int8 mode, and the error caused by incorrect handling of dim attribute data type in reduce operator. [#61596](https://github.com/PaddlePaddle/Paddle/pull/61596) -- Fix some runtime error problems in mixed-precision inference mode.Include the errors caused by sharing weights among fused conv2d operators without correctly converting weight layout, fused conv2d operator backend not properly selected as cuDNN, fused conv2d operator incorrectly handling bias dimension under NHWC, incorrectly handling input data type of norm class operator. [#60955](https://github.com/PaddlePaddle/Paddle/pull/60955)、[#60076](https://github.com/PaddlePaddle/Paddle/pull/60076)、[#63007](https://github.com/PaddlePaddle/Paddle/pull/63007)、[#63988](https://github.com/PaddlePaddle/Paddle/pull/63988) -- Fix the problem that config.delete_pass function does not take effect. [#61056](https://github.com/PaddlePaddle/Paddle/pull/61056) -- Fix the GC mechanism of While control flow in PIR to recycle unwanted inputs in advance and reduce the peak memory, for example, 2GB memory reduction in LLaMA 7B model. [#63062](https://github.com/PaddlePaddle/Paddle/pull/63062) -- Fix the OneDNN mean kernel rollback error. [#64676](https://github.com/PaddlePaddle/Paddle/pull/64676) -- Fix the conv_bias_fuse_pass strong constraints newly added, e.g., the shape of the bias cannot be 1, so as to ensure the stability of the pass inference result. [#64412](https://github.com/PaddlePaddle/Paddle/pull/64412) -- Fix the conv_elementwise_add_onednn_fuse_pass strong constraints newly added, e.g., conv2d_out and residual_param must have the same size, so that the pass inference is stable. [#64448](https://github.com/PaddlePaddle/Paddle/pull/64448) -- Fix the problem of repeatedly inserting quantized inverse-quantization operators under certain circumstances [#63082](https://github.com/PaddlePaddle/Paddle/pull/63082) - -## Hardware Adaptation - -### Adaptation Scheme (Custom Device) - -For PaddlePaddle hardware access, add the daily release supports for 4 hardware Kunlun XPU, Ascend NPU, Hygon DCU and Cambricon MLU this time. Meanwhile, the problems in distributed communications have been fixed through large model training and inference deployment, and performance is optimized through functions such as video memory optimization, and overlap of computation and communication. Furthermore, each hardware is also added to support a large number of BFloat16 data type operators this time, as well as many operator fusion Pass and fusion operators on each hardware. Through the hardware and software together, hardware large Transformer operator library is accessed to fully improve the performance of large models. - -#### New Features - -- Add the support for distributed policy sharding stage1 v2. [#61500](https://github.com/PaddlePaddle/Paddle/pull/61500) -- Support the distributed communication module in BF16 data type.Add some operators to support for BF16 data types such as empty, shape, etc. [#60768](https://github.com/PaddlePaddle/Paddle/pull/60768),[#62140](https://github.com/PaddlePaddle/Paddle/pull/62140),[#62604](https://github.com/PaddlePaddle/Paddle/pull/62604) -- Add the support for get_comm_name interface, support for memory stat function, and support for Profiler to record memory time. [#62556](https://github.com/PaddlePaddle/Paddle/pull/62556),[#61030](https://github.com/PaddlePaddle/Paddle/pull/61030),[#62292](https://github.com/PaddlePaddle/Paddle/pull/62292) -- Add support for some fusion strategies and operators, including silu_fuse_pass, conv_elementwise_add_act_fuse_pass, and generator offset. [#60595](https://github.com/PaddlePaddle/Paddle/pull/60595),[#60708](https://github.com/PaddlePaddle/Paddle/pull/60708),[#60616](https://github.com/PaddlePaddle/Paddle/pull/60616) - -#### Performance Optimization - -- The distributed communication strategy Sharing uses asynchronous strategy in Broadcast parameter, to improve the overlap between computation and communication. [#59745](https://github.com/PaddlePaddle/Paddle/pull/59745) -- Add the support for STRIDED Layout operator to improve the performance of the operator. [#62532](https://github.com/PaddlePaddle/Paddle/pull/62532),[#62697](https://github.com/PaddlePaddle/Paddle/pull/62697),[#62649](https://github.com/PaddlePaddle/Paddle/pull/62649) -- Optimize the memory usage of elementwise_mul operator.[#62377](https://github.com/PaddlePaddle/Paddle/pull/62377) - -#### Bug Fixing - -- Fix the bug under the distributed strategy Sharing. [#61942](https://github.com/PaddlePaddle/Paddle/pull/61942),[#62236](https://github.com/PaddlePaddle/Paddle/pull/62236),[#62305](https://github.com/PaddlePaddle/Paddle/pull/62305),[#62535](https://github.com/PaddlePaddle/Paddle/pull/62535),[#62572](https://github.com/PaddlePaddle/Paddle/pull/62572),[#61601](https://github.com/PaddlePaddle/Paddle/pull/61601) -- Fix the problem that the operator cannot be registered due to c_embedding operator is not under PHI namespace. [#60774](https://github.com/PaddlePaddle/Paddle/pull/60774) -- Fix the xccl_comm release issue. [#60465](https://github.com/PaddlePaddle/Paddle/pull/60465) -- Fix data address error caused by index_put operator fallbacking cpu. [#61842](https://github.com/PaddlePaddle/Paddle/pull/61842) -- Fix stream_safe_custom_device_allocator issue. [#63369](https://github.com/PaddlePaddle/Paddle/pull/63369) -- Fix the distributed worker port conflict issue. [#61409](https://github.com/PaddlePaddle/Paddle/pull/61409) -- Fix comm data type to improve device compatibility. [#62306](https://github.com/PaddlePaddle/Paddle/pull/62306) -- Unify the use of comm data type to phi::DataType. [#62464](https://github.com/PaddlePaddle/Paddle/pull/62464),[#62562](https://github.com/PaddlePaddle/Paddle/pull/62562) -- Fix the problem of missing precision parameter in PD_ConfigEnableCustomDevice. [#63702](https://github.com/PaddlePaddle/Paddle/pull/63702) - -### Kunlun XPU - -#### New Features - -- Add the support for BF16 data types for some operators, including compare_kernel and add reduce_all_kernel ([#63602](https://github.com/PaddlePaddle/Paddle/pull/63602)), empty([#60212](https://github.com/PaddlePaddle/Paddle/pull/60212)), hybrid_parallel_optimizer([#60213](https://github.com/PaddlePaddle/Paddle/pull/60213)), reduce_max/reduce_min([#60453](https://github.com/PaddlePaddle/Paddle/pull/60453)), all_reduce/concat/split([#62364](https://github.com/PaddlePaddle/Paddle/pull/62364)), tile/tile_grad([#63075](https://github.com/PaddlePaddle/Paddle/pull/63075)), accuracy([#63863](https://github.com/PaddlePaddle/Paddle/pull/63863)), swiglu/set_value([#64070](https://github.com/PaddlePaddle/Paddle/pull/64070)), amp_master_grad([#63865](https://github.com/PaddlePaddle/Paddle/pull/63865)), c_concat ([#63403](https://github.com/PaddlePaddle/Paddle/pull/63403)), flatten ([#63997](https://github.com/PaddlePaddle/Paddle/pull/63997)), compare_op ([#64473](https://github.com/PaddlePaddle/Paddle/pull/64473)), moment1/moment2 ([#62688](https://github.com/PaddlePaddle/Paddle/pull/62688)), fused_rope ([#60064](https://github.com/PaddlePaddle/Paddle/pull/60064)), c_softmax_with_cross_entropy ([#60472](https://github.com/PaddlePaddle/Paddle/pull/60472)), elementwise_pow/square/sin/cos ([#60402](https://github.com/PaddlePaddle/Paddle/pull/60402)), strided_slice ([#60382](https://github.com/PaddlePaddle/Paddle/pull/60382)), tile/sigmoid_grad ([#60119](https://github.com/PaddlePaddle/Paddle/pull/60119)), elementwise_sub/elementwise_div ([#60386](https://github.com/PaddlePaddle/Paddle/pull/60386)), softmax_with_cross_entropy ([#63759](https://github.com/PaddlePaddle/Paddle/pull/63759)) -- Add the support for INT8 data types for some operators, including multi_encoder_xpu ([#61212](https://github.com/PaddlePaddle/Paddle/pull/61212)), qkv_attention ([#63105](https://github.com/PaddlePaddle/Paddle/pull/63105)) -- Update Kunlun SDK versions including BKCL, XHPC, XCCL, etc. [#59895](https://github.com/PaddlePaddle/Paddle/pull/59895)、[#59888](https://github.com/PaddlePaddle/Paddle/pull/59888)、[#63624](https://github.com/PaddlePaddle/Paddle/pull/63624), [#60305](https://github.com/PaddlePaddle/Paddle/pull/60305), [#62076](https://github.com/PaddlePaddle/Paddle/pull/62076), [#62646](https://github.com/PaddlePaddle/Paddle/pull/62646), [#63520](https://github.com/PaddlePaddle/Paddle/pull/63520), [#64163](https://github.com/PaddlePaddle/Paddle/pull/64163), [#64326](https://github.com/PaddlePaddle/Paddle/pull/64326), [#60617](https://github.com/PaddlePaddle/Paddle/pull/60617), [#60377](https://github.com/PaddlePaddle/Paddle/pull/60377), [#60421](https://github.com/PaddlePaddle/Paddle/pull/60421), [#60598](https://github.com/PaddlePaddle/Paddle/pull/60598), [#61199](https://github.com/PaddlePaddle/Paddle/pull/61199) -- Add the support for memory stat function. [#61116](https://github.com/PaddlePaddle/Paddle/pull/61116) -- Add multi-stream support, to assign default l3/gm buffer size to each stream. [#62729](https://github.com/PaddlePaddle/Paddle/pull/62729) -- Add nonzero operator, to support simulator XPUSIM_SKIP_RUN mode. [#60224](https://github.com/PaddlePaddle/Paddle/pull/60224)。[#60388](https://github.com/PaddlePaddle/Paddle/pull/60388) -- Add stride_slice and stride_slice_grad operators, to support strides < 0. [#62749](https://github.com/PaddlePaddle/Paddle/pull/62749) -- Add rotary_embedding, to support use_neox_rotary_style == True. [#64090](https://github.com/PaddlePaddle/Paddle/pull/64090) -- Add fusion Pass and fusion operators including cross_attention ([#63203](https://github.com/PaddlePaddle/Paddle/pull/63203)), fused_bias_act ([#62232](https://github.com/PaddlePaddle/Paddle/pull/62232)), fused_layernorm ([#62228](https://github.com/PaddlePaddle/Paddle/pull/62228)), group_norm_silu_xpu_fuse_pass ([#63342](https://github.com/PaddlePaddle/Paddle/pull/63342)) -- Add the support for distributed policy sharding stage3. [#57457](https://github.com/PaddlePaddle/Paddle/pull/57457) -- Add the support for tf32 fc quantization mode. [#62273](https://github.com/PaddlePaddle/Paddle/pull/62273) -- Add the flash attention operator. [#60065](https://github.com/PaddlePaddle/Paddle/pull/60065) -- Add the roformer relative embedding pass & kernel and support multi_encoder_xpu. [#62089](https://github.com/PaddlePaddle/Paddle/pull/62089) -- Add the support for pp + sharding strategy. [#63640](https://github.com/PaddlePaddle/Paddle/pull/63640) -- Upgrade the XPU communication library architecture to support dynamic-static unified communication library function. [#63817](https://github.com/PaddlePaddle/Paddle/pull/63817) - -#### Performance Optimization - -- Add XHPC buffer manager to improve the performance of Paddle and XHPC memory collaboration. [#63924](https://github.com/PaddlePaddle/Paddle/pull/63924) -- Enhance TensorSetConstantXPU performance and support BF16 data type. [#63920](https://github.com/PaddlePaddle/Paddle/pull/63920),[#61818](https://github.com/PaddlePaddle/Paddle/pull/61818) -- Fusion multiple group norm + silu + conv modules and compress the video memory. [#62892](https://github.com/PaddlePaddle/Paddle/pull/62892) -- Optimize XPU memory allocation in comm manager. [#64139](https://github.com/PaddlePaddle/Paddle/pull/64139) -- Optimize operator performance, including mean_all_grad ([#61148](https://github.com/PaddlePaddle/Paddle/pull/61148)), dropout_v2 ([#61029](https://github.com/PaddlePaddle/Paddle/pull/61029)), fused_rotary_position_embedding ([#62846](https://github.com/PaddlePaddle/Paddle/pull/62846)), cross_entropy ([#63159](https://github.com/PaddlePaddle/Paddle/pull/63159)), elementwise_add ([#64289](https://github.com/PaddlePaddle/Paddle/pull/64289)), fused_gemm_epilogue ([#61350](https://github.com/PaddlePaddle/Paddle/pull/61350), check_nan_or_inf ([#60853](https://github.com/PaddlePaddle/Paddle/pull/60853)) - -#### Bug Fixing - -- Fix the tile operator support for 0-dimensional Tensor. [#64279](https://github.com/PaddlePaddle/Paddle/pull/64279) -- Fix the group_norm_silu_fuse_pass. [#63449](https://github.com/PaddlePaddle/Paddle/pull/63449) -- Fix the XPU API GM memory issue. [#60260](https://github.com/PaddlePaddle/Paddle/pull/60260),[#60387](https://github.com/PaddlePaddle/Paddle/pull/60387),[#62940](https://github.com/PaddlePaddle/Paddle/pull/62940) -- Fix the distributed strategy Sharing stage1 v2 bug. [#64209](https://github.com/PaddlePaddle/Paddle/pull/64209) -- Fix the XPU constant issue. [#60763](https://github.com/PaddlePaddle/Paddle/pull/60763) -- Fix some operator issues, including AdamW ([#62251](https://github.com/PaddlePaddle/Paddle/pull/62251)), dropout_v3 ([#62726](https://github.com/PaddlePaddle/Paddle/pull/62726)), softmax([#63780](https://github.com/PaddlePaddle/Paddle/pull/63780)) , fused rope embedding ([#62143](https://github.com/PaddlePaddle/Paddle/pull/62143)), elementwise_add ([#60252](https://github.com/PaddlePaddle/Paddle/pull/60252)), resnet_basic_block ([#62914](https://github.com/PaddlePaddle/Paddle/pull/62914)) -- Fix XPU runtime and installation related issues. [#60028](https://github.com/PaddlePaddle/Paddle/pull/60028),[#61970](https://github.com/PaddlePaddle/Paddle/pull/61970) -- Fix XPU compilation bugs. [#63307](https://github.com/PaddlePaddle/Paddle/pull/63307) -- Fix end-side memory related bugs when initializing XPU communication library. [#64396](https://github.com/PaddlePaddle/Paddle/pull/64396) +### Upgrade of hardware underlying basic libraries -### Hygon DCU +The upgrade of the basic library supports Kunlun Core P800, as well as the support for basic components +[#65494](https://github.com/PaddlePaddle/Paddle/pull/65494), [#65924](https://github.com/PaddlePaddle/Paddle/pull/65924), [#69752](https://github.com/PaddlePaddle/Paddle/pull/69752), [#70835](https://github.com/PaddlePaddle/Paddle/pull/70835), [#65554](https://github.com/PaddlePaddle/Paddle/pull/65554), [#66998](https://github.com/PaddlePaddle/Paddle/pull/66998), [#65278](https://github.com/PaddlePaddle/Paddle/pull/65278), [#70614](https://github.com/PaddlePaddle/Paddle/pull/70614), [#71012](https://github.com/PaddlePaddle/Paddle/pull/71012), [#71178](https://github.com/PaddlePaddle/Paddle/pull/71178), [#71168](https://github.com/PaddlePaddle/Paddle/pull/71168), [#68740](https://github.com/PaddlePaddle/Paddle/pull/68740), [#71100](https://github.com/PaddlePaddle/Paddle/pull/71100), [#65221](https://github.com/PaddlePaddle/Paddle/pull/65221), [#67983](https://github.com/PaddlePaddle/Paddle/pull/67983) -#### New Features +### Others -- Add the support for Hygon DCU K100. [#63535](https://github.com/PaddlePaddle/Paddle/pull/63535) -- Support the complex64/128 data type and fusion operators such as fused_bias_residual_layernorm, fused_bias_dropout_residual_layer_norm, and rms_norm. [#63217](https://github.com/PaddlePaddle/Paddle/pull/63217) +Modifications to related modules such as op test +[#65654](https://github.com/PaddlePaddle/Paddle/pull/65654), [#66233](https://github.com/PaddlePaddle/Paddle/pull/66233), [#66728](https://github.com/PaddlePaddle/Paddle/pull/66728), [#67959](https://github.com/PaddlePaddle/Paddle/pull/67959), [#68169](https://github.com/PaddlePaddle/Paddle/pull/68169), [#68418](https://github.com/PaddlePaddle/Paddle/pull/68418), [#68434](https://github.com/PaddlePaddle/Paddle/pull/68434), [#68445](https://github.com/PaddlePaddle/Paddle/pull/68445), [#68877](https://github.com/PaddlePaddle/Paddle/pull/68877), [#68993](https://github.com/PaddlePaddle/Paddle/pull/68993), [#69006](https://github.com/PaddlePaddle/Paddle/pull/69006), [#70471](https://github.com/PaddlePaddle/Paddle/pull/70471), [#70706](https://github.com/PaddlePaddle/Paddle/pull/70706), [#67777](https://github.com/PaddlePaddle/Paddle/pull/67777), [#65698](https://github.com/PaddlePaddle/Paddle/pull/65698), [#68433](https://github.com/PaddlePaddle/Paddle/pull/68433), [#65689](https://github.com/PaddlePaddle/Paddle/pull/65689) -#### Bug Fixing +## 9. Environment update -- Fix compilation error issues in DTK and ROCM version upgrades. [#62832](https://github.com/PaddlePaddle/Paddle/pull/62832),[#62931](https://github.com/PaddlePaddle/Paddle/pull/62931),[#61872](https://github.com/PaddlePaddle/Paddle/pull/61872),[#63738](https://github.com/PaddlePaddle/Paddle/pull/63738) +- We optimized the framework's stability and cross-platform compatibility, fixed issues related to test coverage and compilation environment compatibility, and enhanced support for multiple platforms such as Windows, XPU, and DCU. Simultaneously, we streamlined the code structure, removed obsolete code and unnecessary dependent libraries to reduce maintenance costs, upgraded key dependencies such as CUDA, further optimized the CI/CD process, improved build speed, and enhanced overall system stability. -## Environment Updates +### Bug Fixes -In this PaddlePaddle version, we complete the release and update synchronization of the basic dependency libraries, and remove the old dependency libraries that are no longer updated. Complete a number of optimizations to improve compilation efficiency and compatibility, and improve the CI pipeline monitoring function to enhance the user installation experience. Fixe the several known compilation problems, improved the compilation system of paddle, and add some new features. Through the optimizations, the compilation and installation experience of the PaddlePaddle framework is further improved to bring developers a better use and development experience. +- Improve the CI/CD process, fix test cases, resolve compilation and installation issues in different environments, and enhance the stability and cross-environment compatibility of the framework. + [#65627](https://github.com/PaddlePaddle/Paddle/pull/65627), [#65736](https://github.com/PaddlePaddle/Paddle/pull/65736), [#65900](https://github.com/PaddlePaddle/Paddle/pull/65900), [#66069](https://github.com/PaddlePaddle/Paddle/pull/66069), [#67000](https://github.com/PaddlePaddle/Paddle/pull/67000), [#67312](https://github.com/PaddlePaddle/Paddle/pull/67312), [#67432](https://github.com/PaddlePaddle/Paddle/pull/67432), [#67540](https://github.com/PaddlePaddle/Paddle/pull/67540), [#67670](https://github.com/PaddlePaddle/Paddle/pull/67670), [#68449](https://github.com/PaddlePaddle/Paddle/pull/68449), [#70806](https://github.com/PaddlePaddle/Paddle/pull/70806), [#65665](https://github.com/PaddlePaddle/Paddle/pull/65665), [#65652](https://github.com/PaddlePaddle/Paddle/pull/65652), [#70644](https://github.com/PaddlePaddle/Paddle/pull/70644), [#68119](https://github.com/PaddlePaddle/Paddle/pull/68119), [#68466](https://github.com/PaddlePaddle/Paddle/pull/68466), [#68858](https://github.com/PaddlePaddle/Paddle/pull/68858), [#68788](https://github.com/PaddlePaddle/Paddle/pull/68788), [#68934](https://github.com/PaddlePaddle/Paddle/pull/68934), [#69883](https://github.com/PaddlePaddle/Paddle/pull/69883), [#69924](https://github.com/PaddlePaddle/Paddle/pull/69924), [#71187](https://github.com/PaddlePaddle/Paddle/pull/71187), [#70798](https://github.com/PaddlePaddle/Paddle/pull/70798), [#71248](https://github.com/PaddlePaddle/Paddle/pull/71248), [#70512](https://github.com/PaddlePaddle/Paddle/pull/70512), [#71363](https://github.com/PaddlePaddle/Paddle/pull/71363), [#71438](https://github.com/PaddlePaddle/Paddle/pull/71438), [#71291](https://github.com/PaddlePaddle/Paddle/pull/71291) -### New Support +### Improvement and Upgrade -- Support users to install paddle without relying on local cuda and cudnn, thus improving the user installation experience. [#60841](https://github.com/PaddlePaddle/Paddle/pull/60841),[#61973](https://github.com/PaddlePaddle/Paddle/pull/61973),[#61862](https://github.com/PaddlePaddle/Paddle/pull/61862),[#61235](https://github.com/PaddlePaddle/Paddle/pull/61235),[#61209](https://github.com/PaddlePaddle/Paddle/pull/61209),[#61653](https://github.com/PaddlePaddle/Paddle/pull/61653),[#64083](https://github.com/PaddlePaddle/Paddle/pull/64083) -- Support CUDA 12.3 completely. Complete the retirement of cuda10.2. [#63356](https://github.com/PaddlePaddle/Paddle/pull/63356),[#60299](https://github.com/PaddlePaddle/Paddle/pull/60299),[#64171](https://github.com/PaddlePaddle/Paddle/pull/64171),[#62189](https://github.com/PaddlePaddle/Paddle/pull/62189),[#63392](https://github.com/PaddlePaddle/Paddle/pull/63392),[#64228](https://github.com/PaddlePaddle/Paddle/pull/64228),[#62498](https://github.com/PaddlePaddle/Paddle/pull/62498),[#64298](https://github.com/PaddlePaddle/Paddle/pull/64298) -- Support Python 3.12 completely, bringing more powerful language features and performance optimizations. Complete the retirement of python3.7. [#59875](https://github.com/PaddlePaddle/Paddle/pull/59875),[#59877](https://github.com/PaddlePaddle/Paddle/pull/59877),[#59876](https://github.com/PaddlePaddle/Paddle/pull/59876) -- Upgrade of other paddle-dependent third-party libraries: [#63741](https://github.com/PaddlePaddle/Paddle/pull/63741),[#64447](https://github.com/PaddlePaddle/Paddle/pull/64447),[#60195](https://github.com/PaddlePaddle/Paddle/pull/60195),[#60110](https://github.com/PaddlePaddle/Paddle/pull/60110),[#61509](https://github.com/PaddlePaddle/Paddle/pull/61509) +- Environmental upgrade + [#69491](https://github.com/PaddlePaddle/Paddle/pull/69491), [#66560](https://github.com/PaddlePaddle/Paddle/pull/66560), [#65686](https://github.com/PaddlePaddle/Paddle/pull/65686), [#71177](https://github.com/PaddlePaddle/Paddle/pull/71177), [#71284](https://github.com/PaddlePaddle/Paddle/pull/71284), [#69791](https://github.com/PaddlePaddle/Paddle/pull/69791), [#69349](https://github.com/PaddlePaddle/Paddle/pull/69349), [#70944](https://github.com/PaddlePaddle/Paddle/pull/70944), [#65411](https://github.com/PaddlePaddle/Paddle/pull/65411) +- Pipeline merging + [#66815](https://github.com/PaddlePaddle/Paddle/pull/66815), [#67306](https://github.com/PaddlePaddle/Paddle/pull/67306) +- Improvement of DCU/NPU/KUNLUN pipeline + [#67516](https://github.com/PaddlePaddle/Paddle/pull/67516), [#67629](https://github.com/PaddlePaddle/Paddle/pull/67629), [#67987](https://github.com/PaddlePaddle/Paddle/pull/67987), [#69903](https://github.com/PaddlePaddle/Paddle/pull/69903), [#68448](https://github.com/PaddlePaddle/Paddle/pull/68448), [#70401](https://github.com/PaddlePaddle/Paddle/pull/70401), [#71192](https://github.com/PaddlePaddle/Paddle/pull/71192), [#71197](https://github.com/PaddlePaddle/Paddle/pull/71197), [#68027](https://github.com/PaddlePaddle/Paddle/pull/68027) +- Support for Windows environment + [#70390](https://github.com/PaddlePaddle/Paddle/pull/70390), [#70785](https://github.com/PaddlePaddle/Paddle/pull/70785), [#71286](https://github.com/PaddlePaddle/Paddle/pull/71286), [#71414](https://github.com/PaddlePaddle/Paddle/pull/71414), [#68901](https://github.com/PaddlePaddle/Paddle/pull/68901) +- Improvement of third-party libraries + [#71419](https://github.com/PaddlePaddle/Paddle/pull/71419) +- Other optimizations are aimed at enhancing CI stability and execution efficiency + [#67574](https://github.com/PaddlePaddle/Paddle/pull/67574), [#69058](https://github.com/PaddlePaddle/Paddle/pull/69058), [#70610](https://github.com/PaddlePaddle/Paddle/pull/70610), [#67093](https://github.com/PaddlePaddle/Paddle/pull/67093), [#69037](https://github.com/PaddlePaddle/Paddle/pull/69037), [#65213](https://github.com/PaddlePaddle/Paddle/pull/65213), [#65913](https://github.com/PaddlePaddle/Paddle/pull/65913), [#65947](https://github.com/PaddlePaddle/Paddle/pull/65947), [#66479](https://github.com/PaddlePaddle/Paddle/pull/66479), [#71054](https://github.com/PaddlePaddle/Paddle/pull/71054), [#71396](https://github.com/PaddlePaddle/Paddle/pull/71396) -### Compilation Optimizations +### New Features -- Optimize paddle's CMake codes, significantly improving compilation efficiency and experience. [##59995](https://github.com/PaddlePaddle/Paddle/pull/59995),[#60167](https://github.com/PaddlePaddle/Paddle/pull/60167),[#61052](https://github.com/PaddlePaddle/Paddle/pull/61052),[#59995](https://github.com/PaddlePaddle/Paddle/pull/59995),[#59607](https://github.com/PaddlePaddle/Paddle/pull/59607),[#63093](https://github.com/PaddlePaddle/Paddle/pull/63093),[#63887](https://github.com/PaddlePaddle/Paddle/pull/63887),[#62969](https://github.com/PaddlePaddle/Paddle/pull/62969),[#64007](https://github.com/PaddlePaddle/Paddle/pull/64007),[#59811](https://github.com/PaddlePaddle/Paddle/pull/59811),[#63045](https://github.com/PaddlePaddle/Paddle/pull/63045),[#60235](https://github.com/PaddlePaddle/Paddle/pull/60235),[#60240](https://github.com/PaddlePaddle/Paddle/pull/60240),[#60235](https://github.com/PaddlePaddle/Paddle/pull/60235),[#61411](https://github.com/PaddlePaddle/Paddle/pull/61411),[#61944](https://github.com/PaddlePaddle/Paddle/pull/61944),[#61961](https://github.com/PaddlePaddle/Paddle/pull/61961),[#59990](https://github.com/PaddlePaddle/Paddle/pull/59990),[#59478](https://github.com/PaddlePaddle/Paddle/pull/59478),[#61501](https://github.com/PaddlePaddle/Paddle/pull/61501),[#60066](https://github.com/PaddlePaddle/Paddle/pull/60066),[#64133](https://github.com/PaddlePaddle/Paddle/pull/64133),[#64231](https://github.com/PaddlePaddle/Paddle/pull/64231),[#60087](https://github.com/PaddlePaddle/Paddle/pull/60087),[#60348](https://github.com/PaddlePaddle/Paddle/pull/60348),[#60737](https://github.com/PaddlePaddle/Paddle/pull/60737),[#61364](https://github.com/PaddlePaddle/Paddle/pull/61364),[#63214](https://github.com/PaddlePaddle/Paddle/pull/63214),[#62454](https://github.com/PaddlePaddle/Paddle/pull/62454),[#62473](https://github.com/PaddlePaddle/Paddle/pull/62473),[#63692](https://github.com/PaddlePaddle/Paddle/pull/63692),[#63950](https://github.com/PaddlePaddle/Paddle/pull/63950) -- Support C++ unit test link dynamic library under linux and windowx, greatly reducing the size of C++ unit test and the size of the entire build directory. [#60008](https://github.com/PaddlePaddle/Paddle/pull/60008),[#60960](https://github.com/PaddlePaddle/Paddle/pull/60960),[#60960](https://github.com/PaddlePaddle/Paddle/pull/60960),[#60961](https://github.com/PaddlePaddle/Paddle/pull/60961),[#60831](https://github.com/PaddlePaddle/Paddle/pull/60831),[#60832](https://github.com/PaddlePaddle/Paddle/pull/60832),[#60833](https://github.com/PaddlePaddle/Paddle/pull/60833),[#61372](https://github.com/PaddlePaddle/Paddle/pull/61372),[#60834](https://github.com/PaddlePaddle/Paddle/pull/60834),[#61374](https://github.com/PaddlePaddle/Paddle/pull/61374),[#61463](https://github.com/PaddlePaddle/Paddle/pull/61463),[#61376](https://github.com/PaddlePaddle/Paddle/pull/61376),[#60830](https://github.com/PaddlePaddle/Paddle/pull/60830),[#61373](https://github.com/PaddlePaddle/Paddle/pull/61373),[#61672](https://github.com/PaddlePaddle/Paddle/pull/61672),[#61375](https://github.com/PaddlePaddle/Paddle/pull/61375),[#61676](https://github.com/PaddlePaddle/Paddle/pull/61676),[#62036](https://github.com/PaddlePaddle/Paddle/pull/62036),[#61945](https://github.com/PaddlePaddle/Paddle/pull/61945),[#61675](https://github.com/PaddlePaddle/Paddle/pull/61675),[#61674](https://github.com/PaddlePaddle/Paddle/pull/61674),[#62773](https://github.com/PaddlePaddle/Paddle/pull/62773),[#61238](https://github.com/PaddlePaddle/Paddle/pull/61238),[#59988](https://github.com/PaddlePaddle/Paddle/pull/59988),[#60307](https://github.com/PaddlePaddle/Paddle/pull/60307),[#59612](https://github.com/PaddlePaddle/Paddle/pull/59612),[#59942](https://github.com/PaddlePaddle/Paddle/pull/59942),[#59968](https://github.com/PaddlePaddle/Paddle/pull/59968),[#59978](https://github.com/PaddlePaddle/Paddle/pull/59978),[#60121](https://github.com/PaddlePaddle/Paddle/pull/60121),[#60149](https://github.com/PaddlePaddle/Paddle/pull/60149),[#60161](https://github.com/PaddlePaddle/Paddle/pull/60161),[#60160](https://github.com/PaddlePaddle/Paddle/pull/60160),[#60230](https://github.com/PaddlePaddle/Paddle/pull/60230),[#60154](https://github.com/PaddlePaddle/Paddle/pull/60154),[#60356](https://github.com/PaddlePaddle/Paddle/pull/60356),[#60392](https://github.com/PaddlePaddle/Paddle/pull/60392),[#60517](https://github.com/PaddlePaddle/Paddle/pull/60517),[#61131](https://github.com/PaddlePaddle/Paddle/pull/61131),[#60959](https://github.com/PaddlePaddle/Paddle/pull/60959) -- Add the support for Clang compiler. Users can now use Clang to compile, enjoying faster compilation speed and better error message prompts. [#63382](https://github.com/PaddlePaddle/Paddle/pull/63382),[#63133](https://github.com/PaddlePaddle/Paddle/pull/63133),[#61705](https://github.com/PaddlePaddle/Paddle/pull/61705),[#63152](https://github.com/PaddlePaddle/Paddle/pull/63152),[#63373](https://github.com/PaddlePaddle/Paddle/pull/63373) +- Added Github Action mechanism + [#70571](https://github.com/PaddlePaddle/Paddle/pull/70571), [#70626](https://github.com/PaddlePaddle/Paddle/pull/70626), [#71325](https://github.com/PaddlePaddle/Paddle/pull/71325), [#71344](https://github.com/PaddlePaddle/Paddle/pull/71344), [#71353](https://github.com/PaddlePaddle/Paddle/pull/71353), [#71322](https://github.com/PaddlePaddle/Paddle/pull/71322), [#70415](https://github.com/PaddlePaddle/Paddle/pull/70415), [#70465](https://github.com/PaddlePaddle/Paddle/pull/70465), [#70524](https://github.com/PaddlePaddle/Paddle/pull/70524), [#70550](https://github.com/PaddlePaddle/Paddle/pull/70550), [#70564](https://github.com/PaddlePaddle/Paddle/pull/70564), [#70579](https://github.com/PaddlePaddle/Paddle/pull/70579), [#70580](https://github.com/PaddlePaddle/Paddle/pull/70580), [#70963](https://github.com/PaddlePaddle/Paddle/pull/70963), [#71200](https://github.com/PaddlePaddle/Paddle/pull/71200), [#71261](https://github.com/PaddlePaddle/Paddle/pull/71261), [#71265](https://github.com/PaddlePaddle/Paddle/pull/71265) -### CI Pipeline Improvements +### Discarded -- Improve the merge-in code monitoring mechanism in the CI pipeline, to ensure higher code quality and stability. Add a function monitoring module, to monitor various indicators of the CI pipeline in real time, ensuring smooth execution of each stage, to identify and resolve issues in a timely manner. [#61384](https://github.com/PaddlePaddle/Paddle/pull/61384),[#62190](https://github.com/PaddlePaddle/Paddle/pull/62190),[#60758](https://github.com/PaddlePaddle/Paddle/pull/60758),[#60399](https://github.com/PaddlePaddle/Paddle/pull/60399),[#58623](https://github.com/PaddlePaddle/Paddle/pull/58623),[#62177](https://github.com/PaddlePaddle/Paddle/pull/62177),[#62361](https://github.com/PaddlePaddle/Paddle/pull/62361),[#62893](https://github.com/PaddlePaddle/Paddle/pull/62893),[#63705](https://github.com/PaddlePaddle/Paddle/pull/63705),[#64476](https://github.com/PaddlePaddle/Paddle/pull/64476),[#64752](https://github.com/PaddlePaddle/Paddle/pull/64752),[#64733](https://github.com/PaddlePaddle/Paddle/pull/64733),[#61914](https://github.com/PaddlePaddle/Paddle/pull/61914) +- Cleanup of obsolete code and dependencies, including removing Python libraries that are no longer needed and simplifying compilation configurations to reduce maintenance costs + [#65635](https://github.com/PaddlePaddle/Paddle/pull/65635), [#67542](https://github.com/PaddlePaddle/Paddle/pull/67542), [#67609](https://github.com/PaddlePaddle/Paddle/pull/67604), [#69572](https://github.com/PaddlePaddle/Paddle/pull/69572), [#68150](https://github.com/PaddlePaddle/Paddle/pull/68150), [#67604](https://github.com/PaddlePaddle/Paddle/pull/67604), [#68561](https://github.com/PaddlePaddle/Paddle/pull/68561), [#68904](https://github.com/PaddlePaddle/Paddle/pull/68904), [#67219](https://github.com/PaddlePaddle/Paddle/pull/67219) -### Code Cleanup +## 10. other -- Remove some old codes. [#63580](https://github.com/PaddlePaddle/Paddle/pull/63580),[#62840](https://github.com/PaddlePaddle/Paddle/pull/62840),[#62886](https://github.com/PaddlePaddle/Paddle/pull/62886),[#63046](https://github.com/PaddlePaddle/Paddle/pull/63046),[#63004](https://github.com/PaddlePaddle/Paddle/pull/63004),[#63039](https://github.com/PaddlePaddle/Paddle/pull/63039),[#62733](https://github.com/PaddlePaddle/Paddle/pull/62733),[#62773](https://github.com/PaddlePaddle/Paddle/pull/62773),[#62768](https://github.com/PaddlePaddle/Paddle/pull/62768),[#62744](https://github.com/PaddlePaddle/Paddle/pull/62744),[#62861](https://github.com/PaddlePaddle/Paddle/pull/62861),[#62774](https://github.com/PaddlePaddle/Paddle/pull/62774),[#62851](https://github.com/PaddlePaddle/Paddle/pull/62851),[#62973](https://github.com/PaddlePaddle/Paddle/pull/62973),[#63273](https://github.com/PaddlePaddle/Paddle/pull/63273),[#62445](https://github.com/PaddlePaddle/Paddle/pull/62445),[#64382](https://github.com/PaddlePaddle/Paddle/pull/64382),[#64409](https://github.com/PaddlePaddle/Paddle/pull/64409),[#64391](https://github.com/PaddlePaddle/Paddle/pull/64391),[#64310](https://github.com/PaddlePaddle/Paddle/pull/64310),[#64348](https://github.com/PaddlePaddle/Paddle/pull/64348),[#64651](https://github.com/PaddlePaddle/Paddle/pull/64651),[#64709](https://github.com/PaddlePaddle/Paddle/pull/64709),[#61714](https://github.com/PaddlePaddle/Paddle/pull/61714),[#62109](https://github.com/PaddlePaddle/Paddle/pull/62109),[#61751](https://github.com/PaddlePaddle/Paddle/pull/61751),[#61691](https://github.com/PaddlePaddle/Paddle/pull/61691),[#61735](https://github.com/PaddlePaddle/Paddle/pull/61735) +- Changes unrelated to user usage, including cleanup of obsolete code, code migration, cleanup of unit tests, debugging, or upgrades to monitoring mechanisms. -### Bug Fixing +### Developer-related content -- Fix several compilation issues of paddle framework. [#63297](https://github.com/PaddlePaddle/Paddle/pull/63297),[#62994](https://github.com/PaddlePaddle/Paddle/pull/62994),[#62651](https://github.com/PaddlePaddle/Paddle/pull/62651),[#64408](https://github.com/PaddlePaddle/Paddle/pull/64408),[#60934](https://github.com/PaddlePaddle/Paddle/pull/60934),[#62899](https://github.com/PaddlePaddle/Paddle/pull/62899),[#60528](https://github.com/PaddlePaddle/Paddle/pull/60528),[#63158](https://github.com/PaddlePaddle/Paddle/pull/63158),[#64549](https://github.com/PaddlePaddle/Paddle/pull/64549),[#62351](https://github.com/PaddlePaddle/Paddle/pull/62351),[#61259](https://github.com/PaddlePaddle/Paddle/pull/61259),[#61281](https://github.com/PaddlePaddle/Paddle/pull/61281),[#62304](https://github.com/PaddlePaddle/Paddle/pull/62304),[#60736](https://github.com/PaddlePaddle/Paddle/pull/60736),[#60811](https://github.com/PaddlePaddle/Paddle/pull/60811),[#63949](https://github.com/PaddlePaddle/Paddle/pull/63949),[#59892](https://github.com/PaddlePaddle/Paddle/pull/59892),[#60767](https://github.com/PaddlePaddle/Paddle/pull/60767),[#60856](https://github.com/PaddlePaddle/Paddle/pull/60856),[#61286](https://github.com/PaddlePaddle/Paddle/pull/61286),[#61638](https://github.com/PaddlePaddle/Paddle/pull/61638),[#62079](https://github.com/PaddlePaddle/Paddle/pull/62079),[#62142](https://github.com/PaddlePaddle/Paddle/pull/62142),[#62823](https://github.com/PaddlePaddle/Paddle/pull/62823),[#62814](https://github.com/PaddlePaddle/Paddle/pull/62814),[#62425](https://github.com/PaddlePaddle/Paddle/pull/62425),[#62619](https://github.com/PaddlePaddle/Paddle/pull/62619),[#60207](https://github.com/PaddlePaddle/Paddle/pull/60207),[#60765](https://github.com/PaddlePaddle/Paddle/pull/60765),[#61870](https://github.com/PaddlePaddle/Paddle/pull/61870),[#61923](https://github.com/PaddlePaddle/Paddle/pull/61923),[#62144](https://github.com/PaddlePaddle/Paddle/pull/62144),[#62426](https://github.com/PaddlePaddle/Paddle/pull/62426),[#63848](https://github.com/PaddlePaddle/Paddle/pull/63848),[#60682](https://github.com/PaddlePaddle/Paddle/pull/60682),[#61369](https://github.com/PaddlePaddle/Paddle/pull/61369),[#62882](https://github.com/PaddlePaddle/Paddle/pull/62882),[#63944](https://github.com/PaddlePaddle/Paddle/pull/63944),[#64812](https://github.com/PaddlePaddle/Paddle/pull/64812),[#60654](https://github.com/PaddlePaddle/Paddle/pull/60654),[#60887](https://github.com/PaddlePaddle/Paddle/pull/60887),[#62058](https://github.com/PaddlePaddle/Paddle/pull/62058),[#64639](https://github.com/PaddlePaddle/Paddle/pull/64639),[#60115](https://github.com/PaddlePaddle/Paddle/pull/60115),[#61940](https://github.com/PaddlePaddle/Paddle/pull/61940),[#62614](https://github.com/PaddlePaddle/Paddle/pull/62614),[#59914](https://github.com/PaddlePaddle/Paddle/pull/59914),[#63762](https://github.com/PaddlePaddle/Paddle/pull/63762),[#60145](https://github.com/PaddlePaddle/Paddle/pull/60145),[#60285](https://github.com/PaddlePaddle/Paddle/pull/60285),[#60378](https://github.com/PaddlePaddle/Paddle/pull/60378),[#60393](https://github.com/PaddlePaddle/Paddle/pull/60393),[#61057](https://github.com/PaddlePaddle/Paddle/pull/61057),[#61058](https://github.com/PaddlePaddle/Paddle/pull/61058),[#61151](https://github.com/PaddlePaddle/Paddle/pull/61151),[#61347](https://github.com/PaddlePaddle/Paddle/pull/61347),[#61554](https://github.com/PaddlePaddle/Paddle/pull/61554),[#61844](https://github.com/PaddlePaddle/Paddle/pull/61844),[#62915](https://github.com/PaddlePaddle/Paddle/pull/62915),[#61852](https://github.com/PaddlePaddle/Paddle/pull/61852),[#61704](https://github.com/PaddlePaddle/Paddle/pull/61704),[#61991](https://github.com/PaddlePaddle/Paddle/pull/61991),[#62264](https://github.com/PaddlePaddle/Paddle/pull/62264),[#62762](https://github.com/PaddlePaddle/Paddle/pull/62762),[#63820](https://github.com/PaddlePaddle/Paddle/pull/63820),[#63864](https://github.com/PaddlePaddle/Paddle/pull/63864),[#65017](https://github.com/PaddlePaddle/Paddle/pull/65017),[#61183](https://github.com/PaddlePaddle/Paddle/pull/61183),[#59866](https://github.com/PaddlePaddle/Paddle/pull/59866),[#61171](https://github.com/PaddlePaddle/Paddle/pull/61171),[#61290](https://github.com/PaddlePaddle/Paddle/pull/61290),[#61725](https://github.com/PaddlePaddle/Paddle/pull/61725),[#61614](https://github.com/PaddlePaddle/Paddle/pull/61614),[#61721](https://github.com/PaddlePaddle/Paddle/pull/61721),[#61494](https://github.com/PaddlePaddle/Paddle/pull/61494),[#61556](https://github.com/PaddlePaddle/Paddle/pull/61556),[#61689](https://github.com/PaddlePaddle/Paddle/pull/61689) +- Remove useless debugging code and migrate code + [#65256](https://github.com/PaddlePaddle/Paddle/pull/65256), [#65782](https://github.com/PaddlePaddle/Paddle/pull/65782), [#65836](https://github.com/PaddlePaddle/Paddle/pull/65836), [#65840](https://github.com/PaddlePaddle/Paddle/pull/65840), [#65862](https://github.com/PaddlePaddle/Paddle/pull/65862), [#65863](https://github.com/PaddlePaddle/Paddle/pull/65863), [#65987](https://github.com/PaddlePaddle/Paddle/pull/65987), [#66547](https://github.com/PaddlePaddle/Paddle/pull/66547), [#66556](https://github.com/PaddlePaddle/Paddle/pull/66556), [#66645](https://github.com/PaddlePaddle/Paddle/pull/66645), [#66646](https://github.com/PaddlePaddle/Paddle/pull/66646), [#66648](https://github.com/PaddlePaddle/Paddle/pull/66648), [#66672](https://github.com/PaddlePaddle/Paddle/pull/66672), [#66783](https://github.com/PaddlePaddle/Paddle/pull/66783), [#66083](https://github.com/PaddlePaddle/Paddle/pull/66083), [#65562](https://github.com/PaddlePaddle/Paddle/pull/65562), [#66564](https://github.com/PaddlePaddle/Paddle/pull/66564), [#66370](https://github.com/PaddlePaddle/Paddle/pull/66370), [#66912](https://github.com/PaddlePaddle/Paddle/pull/66912), [#66913](https://github.com/PaddlePaddle/Paddle/pull/66913), [#66914](https://github.com/PaddlePaddle/Paddle/pull/66914), [#66915](https://github.com/PaddlePaddle/Paddle/pull/66915), [#66664](https://github.com/PaddlePaddle/Paddle/pull/66664), [#66671](https://github.com/PaddlePaddle/Paddle/pull/66671), [#66121](https://github.com/PaddlePaddle/Paddle/pull/66121), [#65907](https://github.com/PaddlePaddle/Paddle/pull/65907), [#65949](https://github.com/PaddlePaddle/Paddle/pull/65949), [#65950](https://github.com/PaddlePaddle/Paddle/pull/65950), [#65954](https://github.com/PaddlePaddle/Paddle/pull/65954), [#66545](https://github.com/PaddlePaddle/Paddle/pull/66545), [#66649](https://github.com/PaddlePaddle/Paddle/pull/66649), [#66900](https://github.com/PaddlePaddle/Paddle/pull/66900), [#66901](https://github.com/PaddlePaddle/Paddle/pull/66901), [#66902](https://github.com/PaddlePaddle/Paddle/pull/66902), [#66903](https://github.com/PaddlePaddle/Paddle/pull/66903), [#66904](https://github.com/PaddlePaddle/Paddle/pull/66904), [#66906](https://github.com/PaddlePaddle/Paddle/pull/66906), [#66907](https://github.com/PaddlePaddle/Paddle/pull/66907), [#66908](https://github.com/PaddlePaddle/Paddle/pull/66908), [#66909](https://github.com/PaddlePaddle/Paddle/pull/66909), [#66549](https://github.com/PaddlePaddle/Paddle/pull/66549), [#66555](https://github.com/PaddlePaddle/Paddle/pull/66555), [#66647](https://github.com/PaddlePaddle/Paddle/pull/66647), [#66898](https://github.com/PaddlePaddle/Paddle/pull/66898), [#66886](https://github.com/PaddlePaddle/Paddle/pull/66886), [#66042](https://github.com/PaddlePaddle/Paddle/pull/66042), [#66043](https://github.com/PaddlePaddle/Paddle/pull/66043), [#66045](https://github.com/PaddlePaddle/Paddle/pull/66045), [#66046](https://github.com/PaddlePaddle/Paddle/pull/66046), [#65826](https://github.com/PaddlePaddle/Paddle/pull/65826), [#65825](https://github.com/PaddlePaddle/Paddle/pull/65825), [#65827](https://github.com/PaddlePaddle/Paddle/pull/65827), [#65829](https://github.com/PaddlePaddle/Paddle/pull/65829), [#65830](https://github.com/PaddlePaddle/Paddle/pull/65830), [#65831](https://github.com/PaddlePaddle/Paddle/pull/65831), [#66081](https://github.com/PaddlePaddle/Paddle/pull/66081), [#66082](https://github.com/PaddlePaddle/Paddle/pull/66082), [#66087](https://github.com/PaddlePaddle/Paddle/pull/66087), [#65980](https://github.com/PaddlePaddle/Paddle/pull/65980), [#65981](https://github.com/PaddlePaddle/Paddle/pull/65981), [#65983](https://github.com/PaddlePaddle/Paddle/pull/65983), [#65985](https://github.com/PaddlePaddle/Paddle/pull/65985), [#65979](https://github.com/PaddlePaddle/Paddle/pull/65979), [#65986](https://github.com/PaddlePaddle/Paddle/pull/65986), [#65988](https://github.com/PaddlePaddle/Paddle/pull/65988), [#65989](https://github.com/PaddlePaddle/Paddle/pull/65989), [#66682](https://github.com/PaddlePaddle/Paddle/pull/66682), [#66717](https://github.com/PaddlePaddle/Paddle/pull/66717), [#65802](https://github.com/PaddlePaddle/Paddle/pull/65802), [#66159](https://github.com/PaddlePaddle/Paddle/pull/66159), [#66147](https://github.com/PaddlePaddle/Paddle/pull/66147), [#66149](https://github.com/PaddlePaddle/Paddle/pull/66149), [#66150](https://github.com/PaddlePaddle/Paddle/pull/66150), [#65798](https://github.com/PaddlePaddle/Paddle/pull/65798), [#65731](https://github.com/PaddlePaddle/Paddle/pull/65731), [#66145](https://github.com/PaddlePaddle/Paddle/pull/66145), [#66086](https://github.com/PaddlePaddle/Paddle/pull/66086), [#65781](https://github.com/PaddlePaddle/Paddle/pull/65781), [#65837](https://github.com/PaddlePaddle/Paddle/pull/65837), [#65828](https://github.com/PaddlePaddle/Paddle/pull/65828), [#65864](https://github.com/PaddlePaddle/Paddle/pull/65864), [#65959](https://github.com/PaddlePaddle/Paddle/pull/65959), [#65706](https://github.com/PaddlePaddle/Paddle/pull/65706), [#66918](https://github.com/PaddlePaddle/Paddle/pull/66918), [#66191](https://github.com/PaddlePaddle/Paddle/pull/66191), [#66689](https://github.com/PaddlePaddle/Paddle/pull/66689), [#66808](https://github.com/PaddlePaddle/Paddle/pull/66808), [#65424](https://github.com/PaddlePaddle/Paddle/pull/65424), [#65452](https://github.com/PaddlePaddle/Paddle/pull/65452), [#65463](https://github.com/PaddlePaddle/Paddle/pull/65463), [#65478](https://github.com/PaddlePaddle/Paddle/pull/65478), [#65339](https://github.com/PaddlePaddle/Paddle/pull/65339) +- Standardize code namespaces + [#64755](https://github.com/PaddlePaddle/Paddle/pull/64755), [#64765](https://github.com/PaddlePaddle/Paddle/pull/64765), [#64767](https://github.com/PaddlePaddle/Paddle/pull/64767), [#64770](https://github.com/PaddlePaddle/Paddle/pull/64770), [#64775](https://github.com/PaddlePaddle/Paddle/pull/64775), [#64776](https://github.com/PaddlePaddle/Paddle/pull/64776), [#64757](https://github.com/PaddlePaddle/Paddle/pull/64757), [#64780](https://github.com/PaddlePaddle/Paddle/pull/64780), [#64777](https://github.com/PaddlePaddle/Paddle/pull/64777), [#64779](https://github.com/PaddlePaddle/Paddle/pull/64779), [#64758](https://github.com/PaddlePaddle/Paddle/pull/64758), [#64759](https://github.com/PaddlePaddle/Paddle/pull/64759), [#64762](https://github.com/PaddlePaddle/Paddle/pull/64762) +- Modify operator list + [#66573](https://github.com/PaddlePaddle/Paddle/pull/66573), [#65598](https://github.com/PaddlePaddle/Paddle/pull/65598), [#65100](https://github.com/PaddlePaddle/Paddle/pull/65100), [#65385](https://github.com/PaddlePaddle/Paddle/pull/65385), [#65192](https://github.com/PaddlePaddle/Paddle/pull/65192), [#65118](https://github.com/PaddlePaddle/Paddle/pull/65118), [#65108](https://github.com/PaddlePaddle/Paddle/pull/65108), [#65153](https://github.com/PaddlePaddle/Paddle/pull/65153), [#65465](https://github.com/PaddlePaddle/Paddle/pull/65465), [#65128](https://github.com/PaddlePaddle/Paddle/pull/65128), [#65420](https://github.com/PaddlePaddle/Paddle/pull/65420), [#65099](https://github.com/PaddlePaddle/Paddle/pull/65099), [#65207](https://github.com/PaddlePaddle/Paddle/pull/65207), [#66066](https://github.com/PaddlePaddle/Paddle/pull/66066), [#65400](https://github.com/PaddlePaddle/Paddle/pull/65400), [#65160](https://github.com/PaddlePaddle/Paddle/pull/65160), [#65195](https://github.com/PaddlePaddle/Paddle/pull/65195), [#65445](https://github.com/PaddlePaddle/Paddle/pull/65445), [#65479](https://github.com/PaddlePaddle/Paddle/pull/65479), [#65193](https://github.com/PaddlePaddle/Paddle/pull/65193), [#65401](https://github.com/PaddlePaddle/Paddle/pull/65401), [#66724](https://github.com/PaddlePaddle/Paddle/pull/66724), [#65164](https://github.com/PaddlePaddle/Paddle/pull/65164), [#65466](https://github.com/PaddlePaddle/Paddle/pull/65466), [#65661](https://github.com/PaddlePaddle/Paddle/pull/65661), [#65897](https://github.com/PaddlePaddle/Paddle/pull/65897), [#66022](https://github.com/PaddlePaddle/Paddle/pull/66022), [#65313](https://github.com/PaddlePaddle/Paddle/pull/65313), [#65616](https://github.com/PaddlePaddle/Paddle/pull/65616), [#65588](https://github.com/PaddlePaddle/Paddle/pull/65588), [#65174](https://github.com/PaddlePaddle/Paddle/pull/65174), [#65402](https://github.com/PaddlePaddle/Paddle/pull/65402), [#65154](https://github.com/PaddlePaddle/Paddle/pull/65154), [#65151](https://github.com/PaddlePaddle/Paddle/pull/65151), [#65098](https://github.com/PaddlePaddle/Paddle/pull/65098), [#64953](https://github.com/PaddlePaddle/Paddle/pull/64953), [#65122](https://github.com/PaddlePaddle/Paddle/pull/65122), [#65590](https://github.com/PaddlePaddle/Paddle/pull/65590), [#65152](https://github.com/PaddlePaddle/Paddle/pull/65152) +- The old executor function of the Paddle framework is being phased out + [#65077](https://github.com/PaddlePaddle/Paddle/pull/65077), [#65340](https://github.com/PaddlePaddle/Paddle/pull/65340) +- Error message prompt optimization + [#66668](https://github.com/PaddlePaddle/Paddle/pull/66668), [#66675](https://github.com/PaddlePaddle/Paddle/pull/66675), [#66605](https://github.com/PaddlePaddle/Paddle/pull/66605), [#66613](https://github.com/PaddlePaddle/Paddle/pull/66613), [#66507](https://github.com/PaddlePaddle/Paddle/pull/66507), [#66700](https://github.com/PaddlePaddle/Paddle/pull/66700), [#66739](https://github.com/PaddlePaddle/Paddle/pull/66739), [#66719](https://github.com/PaddlePaddle/Paddle/pull/66719), [#66733](https://github.com/PaddlePaddle/Paddle/pull/66733), [#66552](https://github.com/PaddlePaddle/Paddle/pull/66552), [#66548](https://github.com/PaddlePaddle/Paddle/pull/66548), [#66623](https://github.com/PaddlePaddle/Paddle/pull/66623), [#66702](https://github.com/PaddlePaddle/Paddle/pull/66702), [#66705](https://github.com/PaddlePaddle/Paddle/pull/66705), [#66718](https://github.com/PaddlePaddle/Paddle/pull/66718), [#66727](https://github.com/PaddlePaddle/Paddle/pull/66727), [#66860](https://github.com/PaddlePaddle/Paddle/pull/66860), [#66869](https://github.com/PaddlePaddle/Paddle/pull/66869), [#66933](https://github.com/PaddlePaddle/Paddle/pull/66933), [#66939](https://github.com/PaddlePaddle/Paddle/pull/66939), [#66553](https://github.com/PaddlePaddle/Paddle/pull/66553), [#66774](https://github.com/PaddlePaddle/Paddle/pull/66774), [#66794](https://github.com/PaddlePaddle/Paddle/pull/66794), [#66551](https://github.com/PaddlePaddle/Paddle/pull/66551), [#66540](https://github.com/PaddlePaddle/Paddle/pull/66540), [#66617](https://github.com/PaddlePaddle/Paddle/pull/66617), [#66841](https://github.com/PaddlePaddle/Paddle/pull/66841), [#66788](https://github.com/PaddlePaddle/Paddle/pull/66788), [#66954](https://github.com/PaddlePaddle/Paddle/pull/66954), [#66698](https://github.com/PaddlePaddle/Paddle/pull/66698), [#66782](https://github.com/PaddlePaddle/Paddle/pull/66782), [#66844](https://github.com/PaddlePaddle/Paddle/pull/66844), [#66443](https://github.com/PaddlePaddle/Paddle/pull/66443), [#66455](https://github.com/PaddlePaddle/Paddle/pull/66455), [#66517](https://github.com/PaddlePaddle/Paddle/pull/66517), [#66804](https://github.com/PaddlePaddle/Paddle/pull/66804), [#66802](https://github.com/PaddlePaddle/Paddle/pull/66802), [#66536](https://github.com/PaddlePaddle/Paddle/pull/66536), [#66707](https://github.com/PaddlePaddle/Paddle/pull/66707), [#66525](https://github.com/PaddlePaddle/Paddle/pull/66525), [#66753](https://github.com/PaddlePaddle/Paddle/pull/66753), [#66550](https://github.com/PaddlePaddle/Paddle/pull/66550), [#66857](https://github.com/PaddlePaddle/Paddle/pull/66857), [#66471](https://github.com/PaddlePaddle/Paddle/pull/66471), [#66628](https://github.com/PaddlePaddle/Paddle/pull/66628), [#66469](https://github.com/PaddlePaddle/Paddle/pull/66469), [#66775](https://github.com/PaddlePaddle/Paddle/pull/66775), [#66506](https://github.com/PaddlePaddle/Paddle/pull/66506), [#66780](https://github.com/PaddlePaddle/Paddle/pull/66780), [#66953](https://github.com/PaddlePaddle/Paddle/pull/66953), [#66695](https://github.com/PaddlePaddle/Paddle/pull/66695), [#66603](https://github.com/PaddlePaddle/Paddle/pull/66603), [#66491](https://github.com/PaddlePaddle/Paddle/pull/66491), [#66715](https://github.com/PaddlePaddle/Paddle/pull/66715), [#66632](https://github.com/PaddlePaddle/Paddle/pull/66632), [#66594](https://github.com/PaddlePaddle/Paddle/pull/66594), [#66615](https://github.com/PaddlePaddle/Paddle/pull/66615), [#66578](https://github.com/PaddlePaddle/Paddle/pull/66578), [#66534](https://github.com/PaddlePaddle/Paddle/pull/66534), [#66569](https://github.com/PaddlePaddle/Paddle/pull/66569), [#66529](https://github.com/PaddlePaddle/Paddle/pull/66529), [#66530](https://github.com/PaddlePaddle/Paddle/pull/66530), [#66522](https://github.com/PaddlePaddle/Paddle/pull/66522), [#66789](https://github.com/PaddlePaddle/Paddle/pull/66789), [#66600](https://github.com/PaddlePaddle/Paddle/pull/66600), [#66511](https://github.com/PaddlePaddle/Paddle/pull/66511), [#66512](https://github.com/PaddlePaddle/Paddle/pull/66512), [#66527](https://github.com/PaddlePaddle/Paddle/pull/66527), [#66518](https://github.com/PaddlePaddle/Paddle/pull/66518), [#66958](https://github.com/PaddlePaddle/Paddle/pull/66958), [#66532](https://github.com/PaddlePaddle/Paddle/pull/66532), [#65258](https://github.com/PaddlePaddle/Paddle/pull/65258), [#66487](https://github.com/PaddlePaddle/Paddle/pull/66487), [#66876](https://github.com/PaddlePaddle/Paddle/pull/66876), [#66832](https://github.com/PaddlePaddle/Paddle/pull/66832), [#66872](https://github.com/PaddlePaddle/Paddle/pull/66872), [#66830](https://github.com/PaddlePaddle/Paddle/pull/66830), [#66708](https://github.com/PaddlePaddle/Paddle/pull/66708), [#66502](https://github.com/PaddlePaddle/Paddle/pull/66502), [#66521](https://github.com/PaddlePaddle/Paddle/pull/66521), [#66592](https://github.com/PaddlePaddle/Paddle/pull/66592) -## Documentation-related Bug Fixing +### Discarded -- With the enhancement of API feature, some API documentations have been fixed and enhanced simultaneously. [#62875](https://github.com/PaddlePaddle/Paddle/pull/62875), [#59793](https://github.com/PaddlePaddle/Paddle/pull/59793), [#60002](https://github.com/PaddlePaddle/Paddle/pull/60002), [#59985](https://github.com/PaddlePaddle/Paddle/pull/59985), [#63365](https://github.com/PaddlePaddle/Paddle/pull/63365), [#60962](https://github.com/PaddlePaddle/Paddle/pull/60962), [#60942](https://github.com/PaddlePaddle/Paddle/pull/60942), [#64232](https://github.com/PaddlePaddle/Paddle/pull/64232), [#63255](https://github.com/PaddlePaddle/Paddle/pull/63255) -- Update/supplement API documentation. bernoulli_ ([#64504](https://github.com/PaddlePaddle/Paddle/pull/64504)), paddle.static.ctr_metric_bundle ([#60912](https://github.com/PaddlePaddle/Paddle/pull/60912)), LayerNorm ([#62928](https://github.com/PaddlePaddle/Paddle/pull/62928)), Sequential ([#63128](https://github.com/PaddlePaddle/Paddle/pull/63128)), paddle.summary ([#63121](https://github.com/PaddlePaddle/Paddle/pull/63121)), ShardOptimizer in AutoParallel ([#62933](https://github.com/PaddlePaddle/Paddle/pull/62933)), paddle.nccl.version ([#62480](https://github.com/PaddlePaddle/Paddle/pull/62480)) -- Update the Readme file. [#59883](https://github.com/PaddlePaddle/Paddle/pull/59883),[#60691](https://github.com/PaddlePaddle/Paddle/pull/60691),[#60749](https://github.com/PaddlePaddle/Paddle/pull/60749) -- Update mkldnn to onednn. [#63199](https://github.com/PaddlePaddle/Paddle/pull/63199),[#63202](https://github.com/PaddlePaddle/Paddle/pull/63202),[#63215](https://github.com/PaddlePaddle/Paddle/pull/63215),[#63209](https://github.com/PaddlePaddle/Paddle/pull/63209) -- Fix document rendering bugs. [#59725](https://github.com/PaddlePaddle/Paddle/pull/59725),[#60306](https://github.com/PaddlePaddle/Paddle/pull/60306) -- Fix a lot of typos in the codes to enhance source readability. [#60093](https://github.com/PaddlePaddle/Paddle/pull/60093),[#60603](https://github.com/PaddlePaddle/Paddle/pull/60603),[#60631](https://github.com/PaddlePaddle/Paddle/pull/60631),[#60679](https://github.com/PaddlePaddle/Paddle/pull/60679),[#60741](https://github.com/PaddlePaddle/Paddle/pull/60741),[#60770](https://github.com/PaddlePaddle/Paddle/pull/60770),[#60784](https://github.com/PaddlePaddle/Paddle/pull/60784),[#60825](https://github.com/PaddlePaddle/Paddle/pull/60825),[#60857](https://github.com/PaddlePaddle/Paddle/pull/60857),[#60891](https://github.com/PaddlePaddle/Paddle/pull/60891),[#60921](https://github.com/PaddlePaddle/Paddle/pull/60921),[#60920](https://github.com/PaddlePaddle/Paddle/pull/60920),[#60923](https://github.com/PaddlePaddle/Paddle/pull/60923),[#60928](https://github.com/PaddlePaddle/Paddle/pull/60928),[#60940](https://github.com/PaddlePaddle/Paddle/pull/60940),[#60936](https://github.com/PaddlePaddle/Paddle/pull/60936),[#60932](https://github.com/PaddlePaddle/Paddle/pull/60932),[#60935](https://github.com/PaddlePaddle/Paddle/pull/60935),[#60931](https://github.com/PaddlePaddle/Paddle/pull/60931),[#60951](https://github.com/PaddlePaddle/Paddle/pull/60951),[#60964](https://github.com/PaddlePaddle/Paddle/pull/60964),[#60965](https://github.com/PaddlePaddle/Paddle/pull/60965),[#60967](https://github.com/PaddlePaddle/Paddle/pull/60967),[#60972](https://github.com/PaddlePaddle/Paddle/pull/60972),[#60971](https://github.com/PaddlePaddle/Paddle/pull/60971),[#60980](https://github.com/PaddlePaddle/Paddle/pull/60980),[#60984](https://github.com/PaddlePaddle/Paddle/pull/60984),[#60985](https://github.com/PaddlePaddle/Paddle/pull/60985),[#60989](https://github.com/PaddlePaddle/Paddle/pull/60989),[#60990](https://github.com/PaddlePaddle/Paddle/pull/60990),[#60991](https://github.com/PaddlePaddle/Paddle/pull/60991),[#60992](https://github.com/PaddlePaddle/Paddle/pull/60992),[#60994](https://github.com/PaddlePaddle/Paddle/pull/60994),[#60995](https://github.com/PaddlePaddle/Paddle/pull/60995),[#60996](https://github.com/PaddlePaddle/Paddle/pull/60996),[#61001](https://github.com/PaddlePaddle/Paddle/pull/61001),[#61000](https://github.com/PaddlePaddle/Paddle/pull/61000),[#60999](https://github.com/PaddlePaddle/Paddle/pull/60999),[#60998](https://github.com/PaddlePaddle/Paddle/pull/60998),[#61026](https://github.com/PaddlePaddle/Paddle/pull/61026),[#61009](https://github.com/PaddlePaddle/Paddle/pull/61009),[#61034](https://github.com/PaddlePaddle/Paddle/pull/61034),[#61033](https://github.com/PaddlePaddle/Paddle/pull/61033),[#61020](https://github.com/PaddlePaddle/Paddle/pull/61020),[#61092](https://github.com/PaddlePaddle/Paddle/pull/61092),[#61066](https://github.com/PaddlePaddle/Paddle/pull/61066),[#61063](https://github.com/PaddlePaddle/Paddle/pull/61063),[#61089](https://github.com/PaddlePaddle/Paddle/pull/61089),[#61071](https://github.com/PaddlePaddle/Paddle/pull/61071),[#61129](https://github.com/PaddlePaddle/Paddle/pull/61129),[#61128](https://github.com/PaddlePaddle/Paddle/pull/61128),[#61126](https://github.com/PaddlePaddle/Paddle/pull/61126),[#61123](https://github.com/PaddlePaddle/Paddle/pull/61123),[#61113](https://github.com/PaddlePaddle/Paddle/pull/61113),[#61189](https://github.com/PaddlePaddle/Paddle/pull/61189),[#61175](https://github.com/PaddlePaddle/Paddle/pull/61175),[#61153](https://github.com/PaddlePaddle/Paddle/pull/61153),[#61198](https://github.com/PaddlePaddle/Paddle/pull/61198),[#61206](https://github.com/PaddlePaddle/Paddle/pull/61206),[#61256](https://github.com/PaddlePaddle/Paddle/pull/61256),[#61255](https://github.com/PaddlePaddle/Paddle/pull/61255),[#61251](https://github.com/PaddlePaddle/Paddle/pull/61251),[#61246](https://github.com/PaddlePaddle/Paddle/pull/61246),[#61245](https://github.com/PaddlePaddle/Paddle/pull/61245),[#61231](https://github.com/PaddlePaddle/Paddle/pull/61231),[#61247](https://github.com/PaddlePaddle/Paddle/pull/61247),[#61265](https://github.com/PaddlePaddle/Paddle/pull/61265),[#61264](https://github.com/PaddlePaddle/Paddle/pull/61264),[#61266](https://github.com/PaddlePaddle/Paddle/pull/61266),[#61267](https://github.com/PaddlePaddle/Paddle/pull/61267),[#61268](https://github.com/PaddlePaddle/Paddle/pull/61268),[#61270](https://github.com/PaddlePaddle/Paddle/pull/61270),[#61334](https://github.com/PaddlePaddle/Paddle/pull/61334),[#61392](https://github.com/PaddlePaddle/Paddle/pull/61392),[#61404](https://github.com/PaddlePaddle/Paddle/pull/61404),[#61318](https://github.com/PaddlePaddle/Paddle/pull/61318),[#61383](https://github.com/PaddlePaddle/Paddle/pull/61383),[#61306](https://github.com/PaddlePaddle/Paddle/pull/61306),[#61324](https://github.com/PaddlePaddle/Paddle/pull/61324),[#61426](https://github.com/PaddlePaddle/Paddle/pull/61426),[#61390](https://github.com/PaddlePaddle/Paddle/pull/61390),[#61419](https://github.com/PaddlePaddle/Paddle/pull/61419),[#61420](https://github.com/PaddlePaddle/Paddle/pull/61420),[#61408](https://github.com/PaddlePaddle/Paddle/pull/61408),[#61425](https://github.com/PaddlePaddle/Paddle/pull/61425),[#61557](https://github.com/PaddlePaddle/Paddle/pull/61557),[#61628](https://github.com/PaddlePaddle/Paddle/pull/61628),[#61652](https://github.com/PaddlePaddle/Paddle/pull/61652),[#61602](https://github.com/PaddlePaddle/Paddle/pull/61602),[#61558](https://github.com/PaddlePaddle/Paddle/pull/61558),[#61660](https://github.com/PaddlePaddle/Paddle/pull/61660),[#61423](https://github.com/PaddlePaddle/Paddle/pull/61423),[#61627](https://github.com/PaddlePaddle/Paddle/pull/61627),[#61685](https://github.com/PaddlePaddle/Paddle/pull/61685),[#61690](https://github.com/PaddlePaddle/Paddle/pull/61690),[#61727](https://github.com/PaddlePaddle/Paddle/pull/61727),[#61738](https://github.com/PaddlePaddle/Paddle/pull/61738),[#61740](https://github.com/PaddlePaddle/Paddle/pull/61740),[#61741](https://github.com/PaddlePaddle/Paddle/pull/61741),[#61743](https://github.com/PaddlePaddle/Paddle/pull/61743),[#61744](https://github.com/PaddlePaddle/Paddle/pull/61744),[#61745](https://github.com/PaddlePaddle/Paddle/pull/61745),[#61761](https://github.com/PaddlePaddle/Paddle/pull/61761),[#61762](https://github.com/PaddlePaddle/Paddle/pull/61762),[#61764](https://github.com/PaddlePaddle/Paddle/pull/61764),[#61767](https://github.com/PaddlePaddle/Paddle/pull/61767),[#61768](https://github.com/PaddlePaddle/Paddle/pull/61768),[#61774](https://github.com/PaddlePaddle/Paddle/pull/61774),[#61781](https://github.com/PaddlePaddle/Paddle/pull/61781),[#61783](https://github.com/PaddlePaddle/Paddle/pull/61783),[#61757](https://github.com/PaddlePaddle/Paddle/pull/61757),[#61732](https://github.com/PaddlePaddle/Paddle/pull/61732),[#61776](https://github.com/PaddlePaddle/Paddle/pull/61776),[#61780](https://github.com/PaddlePaddle/Paddle/pull/61780),[#61730](https://github.com/PaddlePaddle/Paddle/pull/61730),[#61728](https://github.com/PaddlePaddle/Paddle/pull/61728),[#61633](https://github.com/PaddlePaddle/Paddle/pull/61633),[#61720](https://github.com/PaddlePaddle/Paddle/pull/61720),[#61734](https://github.com/PaddlePaddle/Paddle/pull/61734),[#61779](https://github.com/PaddlePaddle/Paddle/pull/61779),[#61775](https://github.com/PaddlePaddle/Paddle/pull/61775),[#61773](https://github.com/PaddlePaddle/Paddle/pull/61773),[#61787](https://github.com/PaddlePaddle/Paddle/pull/61787),[#61687](https://github.com/PaddlePaddle/Paddle/pull/61687),[#61747](https://github.com/PaddlePaddle/Paddle/pull/61747),[#61760](https://github.com/PaddlePaddle/Paddle/pull/61760),[#61782](https://github.com/PaddlePaddle/Paddle/pull/61782),[#61800](https://github.com/PaddlePaddle/Paddle/pull/61800),[#61748](https://github.com/PaddlePaddle/Paddle/pull/61748),[#61772](https://github.com/PaddlePaddle/Paddle/pull/61772),[#61786](https://github.com/PaddlePaddle/Paddle/pull/61786),[#61880](https://github.com/PaddlePaddle/Paddle/pull/61880),[#61718](https://github.com/PaddlePaddle/Paddle/pull/61718),[#61742](https://github.com/PaddlePaddle/Paddle/pull/61742),[#61766](https://github.com/PaddlePaddle/Paddle/pull/61766),[#61835](https://github.com/PaddlePaddle/Paddle/pull/61835),[#61838](https://github.com/PaddlePaddle/Paddle/pull/61838),[#61754](https://github.com/PaddlePaddle/Paddle/pull/61754),[#61833](https://github.com/PaddlePaddle/Paddle/pull/61833),[#61749](https://github.com/PaddlePaddle/Paddle/pull/61749),[#61938](https://github.com/PaddlePaddle/Paddle/pull/61938),[#61919](https://github.com/PaddlePaddle/Paddle/pull/61919),[#61924](https://github.com/PaddlePaddle/Paddle/pull/61924),[#61778](https://github.com/PaddlePaddle/Paddle/pull/61778),[#61839](https://github.com/PaddlePaddle/Paddle/pull/61839),[#61879](https://github.com/PaddlePaddle/Paddle/pull/61879),[#61929](https://github.com/PaddlePaddle/Paddle/pull/61929),[#61801](https://github.com/PaddlePaddle/Paddle/pull/61801),[#61788](https://github.com/PaddlePaddle/Paddle/pull/61788),[#61999](https://github.com/PaddlePaddle/Paddle/pull/61999),[#61928](https://github.com/PaddlePaddle/Paddle/pull/61928),[#61958](https://github.com/PaddlePaddle/Paddle/pull/61958),[#61982](https://github.com/PaddlePaddle/Paddle/pull/61982),[#61996](https://github.com/PaddlePaddle/Paddle/pull/61996),[#61953](https://github.com/PaddlePaddle/Paddle/pull/61953),[#61998](https://github.com/PaddlePaddle/Paddle/pull/61998),[#62003](https://github.com/PaddlePaddle/Paddle/pull/62003),[#61921](https://github.com/PaddlePaddle/Paddle/pull/61921),[#61881](https://github.com/PaddlePaddle/Paddle/pull/61881),[#61746](https://github.com/PaddlePaddle/Paddle/pull/61746),[#61955](https://github.com/PaddlePaddle/Paddle/pull/61955),[#62002](https://github.com/PaddlePaddle/Paddle/pull/62002),[#62001](https://github.com/PaddlePaddle/Paddle/pull/62001),[#61997](https://github.com/PaddlePaddle/Paddle/pull/61997),[#61765](https://github.com/PaddlePaddle/Paddle/pull/61765),[#61956](https://github.com/PaddlePaddle/Paddle/pull/61956),[#62004](https://github.com/PaddlePaddle/Paddle/pull/62004),[#62044](https://github.com/PaddlePaddle/Paddle/pull/62044),[#62040](https://github.com/PaddlePaddle/Paddle/pull/62040),[#62043](https://github.com/PaddlePaddle/Paddle/pull/62043),[#62042](https://github.com/PaddlePaddle/Paddle/pull/62042),[#62041](https://github.com/PaddlePaddle/Paddle/pull/62041),[#62039](https://github.com/PaddlePaddle/Paddle/pull/62039),[#62019](https://github.com/PaddlePaddle/Paddle/pull/62019),[#61910](https://github.com/PaddlePaddle/Paddle/pull/61910),[#61882](https://github.com/PaddlePaddle/Paddle/pull/61882),[#61836](https://github.com/PaddlePaddle/Paddle/pull/61836),[#62013](https://github.com/PaddlePaddle/Paddle/pull/62013),[#62055](https://github.com/PaddlePaddle/Paddle/pull/62055),[#62047](https://github.com/PaddlePaddle/Paddle/pull/62047),[#62000](https://github.com/PaddlePaddle/Paddle/pull/62000),[#62048](https://github.com/PaddlePaddle/Paddle/pull/62048),[#62075](https://github.com/PaddlePaddle/Paddle/pull/62075),[#62038](https://github.com/PaddlePaddle/Paddle/pull/62038),[#62045](https://github.com/PaddlePaddle/Paddle/pull/62045),[#62105](https://github.com/PaddlePaddle/Paddle/pull/62105),[#62214](https://github.com/PaddlePaddle/Paddle/pull/62214),[#62212](https://github.com/PaddlePaddle/Paddle/pull/62212),[#62183](https://github.com/PaddlePaddle/Paddle/pull/62183),[#62182](https://github.com/PaddlePaddle/Paddle/pull/62182),[#62181](https://github.com/PaddlePaddle/Paddle/pull/62181),[#62179](https://github.com/PaddlePaddle/Paddle/pull/62179),[#62178](https://github.com/PaddlePaddle/Paddle/pull/62178),[#62172](https://github.com/PaddlePaddle/Paddle/pull/62172),[#62168](https://github.com/PaddlePaddle/Paddle/pull/62168),[#62163](https://github.com/PaddlePaddle/Paddle/pull/62163),[#62162](https://github.com/PaddlePaddle/Paddle/pull/62162),[#62161](https://github.com/PaddlePaddle/Paddle/pull/62161),[#62160](https://github.com/PaddlePaddle/Paddle/pull/62160),[#62046](https://github.com/PaddlePaddle/Paddle/pull/62046),[#62175](https://github.com/PaddlePaddle/Paddle/pull/62175),[#62259](https://github.com/PaddlePaddle/Paddle/pull/62259),[#62258](https://github.com/PaddlePaddle/Paddle/pull/62258),[#62213](https://github.com/PaddlePaddle/Paddle/pull/62213),[#62260](https://github.com/PaddlePaddle/Paddle/pull/62260),[#62290](https://github.com/PaddlePaddle/Paddle/pull/62290),[#62288](https://github.com/PaddlePaddle/Paddle/pull/62288),[#62323](https://github.com/PaddlePaddle/Paddle/pull/62323),[#62319](https://github.com/PaddlePaddle/Paddle/pull/62319),[#62331](https://github.com/PaddlePaddle/Paddle/pull/62331),[#62330](https://github.com/PaddlePaddle/Paddle/pull/62330),[#62329](https://github.com/PaddlePaddle/Paddle/pull/62329),[#62324](https://github.com/PaddlePaddle/Paddle/pull/62324),[#62317](https://github.com/PaddlePaddle/Paddle/pull/62317),[#62311](https://github.com/PaddlePaddle/Paddle/pull/62311),[#62310](https://github.com/PaddlePaddle/Paddle/pull/62310),[#62308](https://github.com/PaddlePaddle/Paddle/pull/62308),[#62289](https://github.com/PaddlePaddle/Paddle/pull/62289),[#62307](https://github.com/PaddlePaddle/Paddle/pull/62307),[#62315](https://github.com/PaddlePaddle/Paddle/pull/62315),[#62406](https://github.com/PaddlePaddle/Paddle/pull/62406),[#62458](https://github.com/PaddlePaddle/Paddle/pull/62458),[#62459](https://github.com/PaddlePaddle/Paddle/pull/62459),[#62481](https://github.com/PaddlePaddle/Paddle/pull/62481),[#62465](https://github.com/PaddlePaddle/Paddle/pull/62465),[#62462](https://github.com/PaddlePaddle/Paddle/pull/62462),[#62453](https://github.com/PaddlePaddle/Paddle/pull/62453),[#62496](https://github.com/PaddlePaddle/Paddle/pull/62496),[#62457](https://github.com/PaddlePaddle/Paddle/pull/62457),[#62537](https://github.com/PaddlePaddle/Paddle/pull/62537),[#62514](https://github.com/PaddlePaddle/Paddle/pull/62514),[#62548](https://github.com/PaddlePaddle/Paddle/pull/62548),[#62544](https://github.com/PaddlePaddle/Paddle/pull/62544),[#62575](https://github.com/PaddlePaddle/Paddle/pull/62575),[#62463](https://github.com/PaddlePaddle/Paddle/pull/62463),[#62643](https://github.com/PaddlePaddle/Paddle/pull/62643),[#62803](https://github.com/PaddlePaddle/Paddle/pull/62803),[#62924](https://github.com/PaddlePaddle/Paddle/pull/62924),[#63037](https://github.com/PaddlePaddle/Paddle/pull/63037),[#63102](https://github.com/PaddlePaddle/Paddle/pull/63102),[#63139](https://github.com/PaddlePaddle/Paddle/pull/63139),[#63092](https://github.com/PaddlePaddle/Paddle/pull/63092),[#63147](https://github.com/PaddlePaddle/Paddle/pull/63147),[#60518](https://github.com/PaddlePaddle/Paddle/pull/60518),[#60485](https://github.com/PaddlePaddle/Paddle/pull/60485),[#61273](https://github.com/PaddlePaddle/Paddle/pull/61273),[#63429](https://github.com/PaddlePaddle/Paddle/pull/63429),[#61954](https://github.com/PaddlePaddle/Paddle/pull/61954) +- Clean up abandoned code and useless unit tests + [#65894](https://github.com/PaddlePaddle/Paddle/pull/65894), [#66165](https://github.com/PaddlePaddle/Paddle/pull/66165), [#66293](https://github.com/PaddlePaddle/Paddle/pull/66293), [#66102](https://github.com/PaddlePaddle/Paddle/pull/66102), [#66442](https://github.com/PaddlePaddle/Paddle/pull/66442), [#66922](https://github.com/PaddlePaddle/Paddle/pull/66922), [#66531](https://github.com/PaddlePaddle/Paddle/pull/66531), [#65518](https://github.com/PaddlePaddle/Paddle/pull/65518), [#66800](https://github.com/PaddlePaddle/Paddle/pull/66800), [#66372](https://github.com/PaddlePaddle/Paddle/pull/66372), [#65902](https://github.com/PaddlePaddle/Paddle/pull/65902), [#65462](https://github.com/PaddlePaddle/Paddle/pull/65462), [#65327](https://github.com/PaddlePaddle/Paddle/pull/65327), [#65189](https://github.com/PaddlePaddle/Paddle/pull/65189), [#65181](https://github.com/PaddlePaddle/Paddle/pull/65181), [#66535](https://github.com/PaddlePaddle/Paddle/pull/66535), [#65383](https://github.com/PaddlePaddle/Paddle/pull/65383), [#65173](https://github.com/PaddlePaddle/Paddle/pull/65173), [#66429](https://github.com/PaddlePaddle/Paddle/pull/66429), [#66386](https://github.com/PaddlePaddle/Paddle/pull/66386), [#66447](https://github.com/PaddlePaddle/Paddle/pull/66447), [#66367](https://github.com/PaddlePaddle/Paddle/pull/66367), [#66160](https://github.com/PaddlePaddle/Paddle/pull/66160), [#65408](https://github.com/PaddlePaddle/Paddle/pull/65408), [#65433](https://github.com/PaddlePaddle/Paddle/pull/65433), [#65481](https://github.com/PaddlePaddle/Paddle/pull/65481), [#65444](https://github.com/PaddlePaddle/Paddle/pull/65444), [#65389](https://github.com/PaddlePaddle/Paddle/pull/65389), [#65663](https://github.com/PaddlePaddle/Paddle/pull/65663), [#65649](https://github.com/PaddlePaddle/Paddle/pull/65649), [#65629](https://github.com/PaddlePaddle/Paddle/pull/65629), [#66142](https://github.com/PaddlePaddle/Paddle/pull/66142), [#65796](https://github.com/PaddlePaddle/Paddle/pull/65796), [#66163](https://github.com/PaddlePaddle/Paddle/pull/66163), [#66291](https://github.com/PaddlePaddle/Paddle/pull/66291), [#65480](https://github.com/PaddlePaddle/Paddle/pull/65480), [#65495](https://github.com/PaddlePaddle/Paddle/pull/65495), [#65498](https://github.com/PaddlePaddle/Paddle/pull/65498), [#65503](https://github.com/PaddlePaddle/Paddle/pull/65503), [#65502](https://github.com/PaddlePaddle/Paddle/pull/65502), [#65501](https://github.com/PaddlePaddle/Paddle/pull/65501), [#65512](https://github.com/PaddlePaddle/Paddle/pull/65512), [#65528](https://github.com/PaddlePaddle/Paddle/pull/65528), [#65472](https://github.com/PaddlePaddle/Paddle/pull/65472), [#65390](https://github.com/PaddlePaddle/Paddle/pull/65390), [#65344](https://github.com/PaddlePaddle/Paddle/pull/65344), [#65384](https://github.com/PaddlePaddle/Paddle/pull/65384), [#65388](https://github.com/PaddlePaddle/Paddle/pull/65388), [#65198](https://github.com/PaddlePaddle/Paddle/pull/65198), [#65248](https://github.com/PaddlePaddle/Paddle/pull/65248), [#65443](https://github.com/PaddlePaddle/Paddle/pull/65443), [#65430](https://github.com/PaddlePaddle/Paddle/pull/65430) -## Others +## 11. List of contributors -Non-user related changes, including deprecated code cleanup, useless unit test cleanup, debugging or upgrade of monitoring mechanism. [#63377](https://github.com/PaddlePaddle/Paddle/pull/63377),[#64106](https://github.com/PaddlePaddle/Paddle/pull/64106),[#64220](https://github.com/PaddlePaddle/Paddle/pull/64220),[#64293](https://github.com/PaddlePaddle/Paddle/pull/64293),[#64464](https://github.com/PaddlePaddle/Paddle/pull/64464),[#64944](https://github.com/PaddlePaddle/Paddle/pull/64944),[#63638](https://github.com/PaddlePaddle/Paddle/pull/63638),[#63732](https://github.com/PaddlePaddle/Paddle/pull/63732),[#63735](https://github.com/PaddlePaddle/Paddle/pull/63735),[#63826](https://github.com/PaddlePaddle/Paddle/pull/63826),[#63982](https://github.com/PaddlePaddle/Paddle/pull/63982),[#63737](https://github.com/PaddlePaddle/Paddle/pull/63737),[#64471](https://github.com/PaddlePaddle/Paddle/pull/64471),[#64574](https://github.com/PaddlePaddle/Paddle/pull/64574),[#64494](https://github.com/PaddlePaddle/Paddle/pull/64494),[#62775](https://github.com/PaddlePaddle/Paddle/pull/62775),[#63601](https://github.com/PaddlePaddle/Paddle/pull/63601),[#62564](https://github.com/PaddlePaddle/Paddle/pull/62564),[#63772](https://github.com/PaddlePaddle/Paddle/pull/63772),[#64719](https://github.com/PaddlePaddle/Paddle/pull/64719),[#61640](https://github.com/PaddlePaddle/Paddle/pull/61640),[#63459](https://github.com/PaddlePaddle/Paddle/pull/63459),[#64062](https://github.com/PaddlePaddle/Paddle/pull/64062),[#63480](https://github.com/PaddlePaddle/Paddle/pull/63480),[#63833](https://github.com/PaddlePaddle/Paddle/pull/63833)[#63673](https://github.com/PaddlePaddle/Paddle/pull/63673),[#63672](https://github.com/PaddlePaddle/Paddle/pull/63672),[#64131](https://github.com/PaddlePaddle/Paddle/pull/64131),[#64156](https://github.com/PaddlePaddle/Paddle/pull/64156),[#64155](https://github.com/PaddlePaddle/Paddle/pull/64155),[#64159](https://github.com/PaddlePaddle/Paddle/pull/64159),[#63902](https://github.com/PaddlePaddle/Paddle/pull/63902),[#64230](https://github.com/PaddlePaddle/Paddle/pull/64230),[#64229](https://github.com/PaddlePaddle/Paddle/pull/64229),[#64236](https://github.com/PaddlePaddle/Paddle/pull/64236),[#64260](https://github.com/PaddlePaddle/Paddle/pull/64260),[#64175](https://github.com/PaddlePaddle/Paddle/pull/64175),[#64250](https://github.com/PaddlePaddle/Paddle/pull/64250),[#64269](https://github.com/PaddlePaddle/Paddle/pull/64269),[#64238](https://github.com/PaddlePaddle/Paddle/pull/64238),[#64349](https://github.com/PaddlePaddle/Paddle/pull/64349),[#64394](https://github.com/PaddlePaddle/Paddle/pull/64394),[#64402](https://github.com/PaddlePaddle/Paddle/pull/64402),[#64401](https://github.com/PaddlePaddle/Paddle/pull/64401),[#64388](https://github.com/PaddlePaddle/Paddle/pull/64388),[#64329](https://github.com/PaddlePaddle/Paddle/pull/64329),[#64502](https://github.com/PaddlePaddle/Paddle/pull/64502),[#64501](https://github.com/PaddlePaddle/Paddle/pull/64501),[#64515](https://github.com/PaddlePaddle/Paddle/pull/64515),[#64503](https://github.com/PaddlePaddle/Paddle/pull/64503),[#64514](https://github.com/PaddlePaddle/Paddle/pull/64514),[#64601](https://github.com/PaddlePaddle/Paddle/pull/64601),[#64564](https://github.com/PaddlePaddle/Paddle/pull/64564),[#64012](https://github.com/PaddlePaddle/Paddle/pull/64012),[#64697](https://github.com/PaddlePaddle/Paddle/pull/64697),[#64682](https://github.com/PaddlePaddle/Paddle/pull/64682),[#64051](https://github.com/PaddlePaddle/Paddle/pull/64051),[#63267](https://github.com/PaddlePaddle/Paddle/pull/63267),[#63426](https://github.com/PaddlePaddle/Paddle/pull/63426),[#63626](https://github.com/PaddlePaddle/Paddle/pull/63626),[#63257](https://github.com/PaddlePaddle/Paddle/pull/63257),[#63266](https://github.com/PaddlePaddle/Paddle/pull/63266),[#63468](https://github.com/PaddlePaddle/Paddle/pull/63468),[#63262](https://github.com/PaddlePaddle/Paddle/pull/63262),[#63248](https://github.com/PaddlePaddle/Paddle/pull/63248),[#63241](https://github.com/PaddlePaddle/Paddle/pull/63241),[#63252](https://github.com/PaddlePaddle/Paddle/pull/63252),[#63258](https://github.com/PaddlePaddle/Paddle/pull/63258),[#63235](https://github.com/PaddlePaddle/Paddle/pull/63235),[#63399](https://github.com/PaddlePaddle/Paddle/pull/63399),[#63488](https://github.com/PaddlePaddle/Paddle/pull/63488),[#63487](https://github.com/PaddlePaddle/Paddle/pull/63487),[#63466](https://github.com/PaddlePaddle/Paddle/pull/63466),[#63464](https://github.com/PaddlePaddle/Paddle/pull/63464),[#63483](https://github.com/PaddlePaddle/Paddle/pull/63483),[#63486](https://github.com/PaddlePaddle/Paddle/pull/63486),[#63475](https://github.com/PaddlePaddle/Paddle/pull/63475),[#63489](https://github.com/PaddlePaddle/Paddle/pull/63489),[#63470](https://github.com/PaddlePaddle/Paddle/pull/63470),[#63457](https://github.com/PaddlePaddle/Paddle/pull/63457),[#63493](https://github.com/PaddlePaddle/Paddle/pull/63493),[#63561](https://github.com/PaddlePaddle/Paddle/pull/63561),[#63584](https://github.com/PaddlePaddle/Paddle/pull/63584),[#63587](https://github.com/PaddlePaddle/Paddle/pull/63587),[#63586](https://github.com/PaddlePaddle/Paddle/pull/63586),[#63569](https://github.com/PaddlePaddle/Paddle/pull/63569),[#63559](https://github.com/PaddlePaddle/Paddle/pull/63559),[#63558](https://github.com/PaddlePaddle/Paddle/pull/63558),[#63555](https://github.com/PaddlePaddle/Paddle/pull/63555),[#63543](https://github.com/PaddlePaddle/Paddle/pull/63543),[#63589](https://github.com/PaddlePaddle/Paddle/pull/63589),[#63583](https://github.com/PaddlePaddle/Paddle/pull/63583),[#63565](https://github.com/PaddlePaddle/Paddle/pull/63565),[#63564](https://github.com/PaddlePaddle/Paddle/pull/63564),[#63265](https://github.com/PaddlePaddle/Paddle/pull/63265),[#63562](https://github.com/PaddlePaddle/Paddle/pull/63562),[#63591](https://github.com/PaddlePaddle/Paddle/pull/63591),[#63460](https://github.com/PaddlePaddle/Paddle/pull/63460),[#63238](https://github.com/PaddlePaddle/Paddle/pull/63238),[#63631](https://github.com/PaddlePaddle/Paddle/pull/63631),[#63707](https://github.com/PaddlePaddle/Paddle/pull/63707),[#63714](https://github.com/PaddlePaddle/Paddle/pull/63714),[#63854](https://github.com/PaddlePaddle/Paddle/pull/63854),[#63929](https://github.com/PaddlePaddle/Paddle/pull/63929),[#63532](https://github.com/PaddlePaddle/Paddle/pull/63532),[#59628](https://github.com/PaddlePaddle/Paddle/pull/59628),[#62209](https://github.com/PaddlePaddle/Paddle/pull/62209),[#63742](https://github.com/PaddlePaddle/Paddle/pull/63742),[#60518](https://github.com/PaddlePaddle/Paddle/pull/60518),[#62078](https://github.com/PaddlePaddle/Paddle/pull/62078),[#62684](https://github.com/PaddlePaddle/Paddle/pull/62684),[#62723](https://github.com/PaddlePaddle/Paddle/pull/62723),[#64141](https://github.com/PaddlePaddle/Paddle/pull/64141),[#60404](https://github.com/PaddlePaddle/Paddle/pull/60404),[#64212](https://github.com/PaddlePaddle/Paddle/pull/64212),[#60652](https://github.com/PaddlePaddle/Paddle/pull/60652),[#64545](https://github.com/PaddlePaddle/Paddle/pull/64545),[#64477](https://github.com/PaddlePaddle/Paddle/pull/64477),[#64556](https://github.com/PaddlePaddle/Paddle/pull/64556),[#63160](https://github.com/PaddlePaddle/Paddle/pull/63160),[#63796](https://github.com/PaddlePaddle/Paddle/pull/63796),[#64693](https://github.com/PaddlePaddle/Paddle/pull/64693),[#64484](https://github.com/PaddlePaddle/Paddle/pull/64484),[#64677](https://github.com/PaddlePaddle/Paddle/pull/64677),[#64461](https://github.com/PaddlePaddle/Paddle/pull/64461),[#63189](https://github.com/PaddlePaddle/Paddle/pull/63189),[#63855](https://github.com/PaddlePaddle/Paddle/pull/63855),[#63896](https://github.com/PaddlePaddle/Paddle/pull/63896),[#63193](https://github.com/PaddlePaddle/Paddle/pull/63193),[#63200](https://github.com/PaddlePaddle/Paddle/pull/63200),[#63406](https://github.com/PaddlePaddle/Paddle/pull/63406),[#61283](https://github.com/PaddlePaddle/Paddle/pull/61283),[#63607](https://github.com/PaddlePaddle/Paddle/pull/63607),[#64486](https://github.com/PaddlePaddle/Paddle/pull/64486),[#64004](https://github.com/PaddlePaddle/Paddle/pull/64004),[#63132](https://github.com/PaddlePaddle/Paddle/pull/63132),[#63553](https://github.com/PaddlePaddle/Paddle/pull/63553),[#63572](https://github.com/PaddlePaddle/Paddle/pull/63572),[#63794](https://github.com/PaddlePaddle/Paddle/pull/63794),[#63919](https://github.com/PaddlePaddle/Paddle/pull/63919),[#63980](https://github.com/PaddlePaddle/Paddle/pull/63980),[#62917](https://github.com/PaddlePaddle/Paddle/pull/62917),[#64451](https://github.com/PaddlePaddle/Paddle/pull/64451),[#63541](https://github.com/PaddlePaddle/Paddle/pull/63541),[#63703](https://github.com/PaddlePaddle/Paddle/pull/63703),[#64536](https://github.com/PaddlePaddle/Paddle/pull/64536),[#63264](https://github.com/PaddlePaddle/Paddle/pull/63264),[#63335](https://github.com/PaddlePaddle/Paddle/pull/63335),[#63841](https://github.com/PaddlePaddle/Paddle/pull/63841),[#64628](https://github.com/PaddlePaddle/Paddle/pull/64628),[#63419](https://github.com/PaddlePaddle/Paddle/pull/63419),[#62210](https://github.com/PaddlePaddle/Paddle/pull/62210),[#63557](https://github.com/PaddlePaddle/Paddle/pull/63557),[#63064](https://github.com/PaddlePaddle/Paddle/pull/63064),[#61442](https://github.com/PaddlePaddle/Paddle/pull/61442),[#63537](https://github.com/PaddlePaddle/Paddle/pull/63537),[#63839](https://github.com/PaddlePaddle/Paddle/pull/63839),[#60927](https://github.com/PaddlePaddle/Paddle/pull/60927),[#60566](https://github.com/PaddlePaddle/Paddle/pull/60566),[#60842](https://github.com/PaddlePaddle/Paddle/pull/60842),[#64612](https://github.com/PaddlePaddle/Paddle/pull/64612),[#60047](https://github.com/PaddlePaddle/Paddle/pull/60047),[#63898](https://github.com/PaddlePaddle/Paddle/pull/63898),[#60415](https://github.com/PaddlePaddle/Paddle/pull/60415),[#60474](https://github.com/PaddlePaddle/Paddle/pull/60474),[#60439](https://github.com/PaddlePaddle/Paddle/pull/60439),[#60565](https://github.com/PaddlePaddle/Paddle/pull/60565),[#64414](https://github.com/PaddlePaddle/Paddle/pull/64414),[#62526](https://github.com/PaddlePaddle/Paddle/pull/62526),[#54183](https://github.com/PaddlePaddle/Paddle/pull/54183),[#64096](https://github.com/PaddlePaddle/Paddle/pull/64096),[#61325](https://github.com/PaddlePaddle/Paddle/pull/61325),[#60629](https://github.com/PaddlePaddle/Paddle/pull/60629),[#61051](https://github.com/PaddlePaddle/Paddle/pull/61051),[#62103](https://github.com/PaddlePaddle/Paddle/pull/62103),[#63594](https://github.com/PaddlePaddle/Paddle/pull/63594),[#60968](https://github.com/PaddlePaddle/Paddle/pull/60968),[#64613](https://github.com/PaddlePaddle/Paddle/pull/64613),[#64073](https://github.com/PaddlePaddle/Paddle/pull/64073),[#63816](https://github.com/PaddlePaddle/Paddle/pull/63816),[#64416](https://github.com/PaddlePaddle/Paddle/pull/64416),[#62499](https://github.com/PaddlePaddle/Paddle/pull/62499),[#64531](https://github.com/PaddlePaddle/Paddle/pull/64531),[#63827](https://github.com/PaddlePaddle/Paddle/pull/63827),[#59885](https://github.com/PaddlePaddle/Paddle/pull/59885),[#59949](https://github.com/PaddlePaddle/Paddle/pull/59949),[#63428](https://github.com/PaddlePaddle/Paddle/pull/63428),[#63218](https://github.com/PaddlePaddle/Paddle/pull/63218),[#63538](https://github.com/PaddlePaddle/Paddle/pull/63538),[#64497](https://github.com/PaddlePaddle/Paddle/pull/64497),[#63082](https://github.com/PaddlePaddle/Paddle/pull/63082),[#64395](https://github.com/PaddlePaddle/Paddle/pull/64395),[#60183](https://github.com/PaddlePaddle/Paddle/pull/60183),[#63691](https://github.com/PaddlePaddle/Paddle/pull/63691),[#64428](https://github.com/PaddlePaddle/Paddle/pull/64428),[#64648](https://github.com/PaddlePaddle/Paddle/pull/64648),[#64650](https://github.com/PaddlePaddle/Paddle/pull/64650),[#59926](https://github.com/PaddlePaddle/Paddle/pull/59926),[#59750](https://github.com/PaddlePaddle/Paddle/pull/59750),[#60080](https://github.com/PaddlePaddle/Paddle/pull/60080),[#60208](https://github.com/PaddlePaddle/Paddle/pull/60208),[#64124](https://github.com/PaddlePaddle/Paddle/pull/64124),[#64187](https://github.com/PaddlePaddle/Paddle/pull/64187),[#64166](https://github.com/PaddlePaddle/Paddle/pull/64166),[#64284](https://github.com/PaddlePaddle/Paddle/pull/64284),[#64253](https://github.com/PaddlePaddle/Paddle/pull/64253),[#64555](https://github.com/PaddlePaddle/Paddle/pull/64555),[#59878](https://github.com/PaddlePaddle/Paddle/pull/59878),[#64081](https://github.com/PaddlePaddle/Paddle/pull/64081) +0x3878f, 0x45f, 2742195759, 86kkd, A-nnonymous, ADream-ki, Aganlengzi, Albresky, AndPuQing, AndSonder, Aoraki-Dream, ApricityXX, Asthestarsfalll, Aurelius84, BHmingyang, BeingGod, Betelgeu, BiynXu, CJ77Qi, Caogration, DDDivano, Dale1314, Deleter-D, DesmonDay, Difers, Dmovic, DongBaiYue, DrRyanHuang, DrownFish19, Eddie-Wang1120, EgoistSA, FeixLiu, ForFishes, Fripping, From00, Function-Samuel, GoldenStain, Guanhuachen2003, GuoxiaWang, Hanyonggong, HarperCy, Hongqing-work, HydrogenSulfate, JZ-LIANG, Jeff114514, JiaWenxuan, LLee233, LanCole, Lans1ot, Layssy, Leoforever123, LiYuRio, LielinJiang, LittleHeroZZZX, Liujie0926, Liyulingyue, Luohongzhige, Marcusryz, MarisaSparkL, Micalling, MikhayEeer, MrXnneHang, MufanColin, NKNaN, Neo-WY, NeroLoh, PolaKuma, Qin-sx, QingshuChen, RachelXu7, RichardWooSJTU, RuohengMa, SCUcookie, Sekiro-x, SigureMo, Sunny-bot1, SylarTiaNII, Sylence8, TBD1, TR666, TimeYWL, Tom-Zheng, Turingg, Victor-Bayim, Vvsmile, WAYKEN-TSE, Wanglongzhi2001, Wangzheee, Waynezee, Wennie396, Whsjrczr, Wizard-ZP, Wong4j, XavierZXY, XiaociZhang, XieYunshen, Xing-lil, Xreki, YKTian-x2b, YZW-explorer, YanhuiDua, YuanRisheng, ZHOU05030, ZhangHandi, ZhangX-21, ZibinGuo, a2064968462, anderson101866, aooxin, aquagull, baoqiwen, bapijun, blacksheep-Aristotle, bukejiyu, carryyu, ccsuzzh, chang-wenbin, changeyoung98, chen2016013, ckl117, cmcamdy, co63oc, continue-coding, cqulilujia, crazyxiaoxi, cszdrg, cubehan3, cyber-pioneer, danleifeng, decade-afk, deepllz, dynamicheart, eee4017, eggman-1024, enkilee, epiphanyer, ethan-sem, fangfangssj, feixi21, fightfat, fufu0615, fxfxfxfxfxfxfxfx, fxy1699, gitliuyf, gongel, gongshaotian, gongweibao, gouzil, gsq7474741, guixxiic, gzy19990617, hanyang2508, haoyu2022, heavyrain-lzy, houj04, huangjiyi, huangkr03, hxzd5568, icpcccpc, inaomIIsfarell, iosmers, jeff41404, jerrywgz, jiachengdai, jiahy0825, jinmingyi1998, jinyouzhi, joseflv, jychen21, jzhang533, kangguangli, kanze1, kineast, kircle888, l1cacheDell, leo0519, lifulll, linkk08, little1d, liufengwei0103, liuruyan, lixcli, liym27, liyongchao911, lizexu123, lizhenyun01, lj970926, lshpku, lszxb, ltd0924, luotao1, lwkhahaha, lxd-cumt, mayang002, megemini, mikemikimike, ming1753, monster1015, mori0umi, ndyysheep, nizne9, nobodynobody, ooooo-create, penPenf28, phlrain, pkuzyc, qili93, rich04lin, risemeup1, ronny1996, rsmallblue, runzhech, skywalker2012, smile2game, sneaxiy, successfulbarrier, sunzhongkai588, swgu98, tc20042008, tianhaodongbd, tianshuo78520a, tizhou86, tlxd, uanu2002, umiswing, vivienfanghuagood, waliwali777, walkalone20, wanghuancoder, wangna11BD, will-jl944, winffke, winter-wang, wwwuyan, xiaoguoguo626807, xiaoluomi, xiaoyao0115, xingmingyyj, xkkkkkk23, xu8117, xuxinyi389, xz-alex, yangrongxinuser, yeteye, yinfan98, yongqiangma, yuan20041218, yuanlehome, yuguo-Jack, yumin066, zbt78, zeroRains, zhangbo9674, zhanghonggeng, zhanglirong1999, zhangting2020, zhangyk0314, zhangyuqin1998, zhiminzhang0830, zhink, zhiqiu, zhouquan32, zhoutianzi666, zhwesky2010, zoooo0820, zrr1999, zty-king, zxcd, zyfncg