diff --git a/docs/hardware_support/dcu/install_cn.md b/docs/hardware_support/dcu/install_cn.md
index a32f38adb58..06885eb498c 100644
--- a/docs/hardware_support/dcu/install_cn.md
+++ b/docs/hardware_support/dcu/install_cn.md
@@ -77,16 +77,15 @@ DCU Temp AvgPwr Fan Perf PwrCap VRAM% DCU%
```bash
# 下载并安装 wheel 包
-python -m pip install --pre paddlepaddle-dcu -i https://www.paddlepaddle.org.cn/packages/nightly/dcu/
+python -m pip install paddlepaddle-dcu==3.1.0 -i https://www.paddlepaddle.org.cn/packages/stable/dcu/
```
-⚠️ 注意:nightly 版本为每日构建,可能存在不稳定性。如果需要更稳定的版本,建议使用 3.0-rc 版本。
### 安装方式二:源代码编译安装
在启动的 docker 容器中,下载 Paddle 源码并编译,CMAKE 编译选项含义请参见[编译选项表](https://www.paddlepaddle.org.cn/documentation/docs/zh/develop/install/Tables.html#Compile)。
```bash
# 下载 Paddle 源码
-git clone https://github.com/PaddlePaddle/Paddle.git -b develop
+git clone https://github.com/PaddlePaddle/Paddle.git -b release/3.1
cd Paddle
# 创建编译目录
@@ -102,9 +101,8 @@ cmake .. -DCMAKE_EXPORT_COMPILE_COMMANDS=ON -DCMAKE_CXX_FLAGS="-Wno-error -w" \
make -j16
# 编译产出在 build/python/dist/ 路径下,使用 pip 安装即可
-pip install -U paddlepaddle_dcu-0.0.0-cp310-cp310-linux_x86_64.whl
+pip install -U paddlepaddle_dcu-*-linux_x86_64.whl
```
-⚠️ 注意:nightly 版本为每日构建,可能存在不稳定性。如果需要更稳定的版本,建议使用 3.0-rc 版本。
## 基础功能检查
安装完成后,在 docker 容器中输入如下命令进行飞桨基础健康功能的检查。
diff --git a/docs/hardware_support/gcu/install_cn.md b/docs/hardware_support/gcu/install_cn.md
index eeabc7251bb..8b7fa46e604 100644
--- a/docs/hardware_support/gcu/install_cn.md
+++ b/docs/hardware_support/gcu/install_cn.md
@@ -25,13 +25,13 @@ lspci | grep S60
```bash
# 拉取镜像
-docker pull ccr-2vdh3abv-pub.cnc.bj.baidubce.com/device/paddle-gcu:topsrider3.2.109-ubuntu20-x86_64-gcc84
+docker pull ccr-2vdh3abv-pub.cnc.bj.baidubce.com/device/paddle-gcu:topsrider3.4.623-ubuntu20-x86_64-gcc84
```
```bash
# 参考如下命令启动容器
docker run --name paddle-gcu-dev -v /home:/home \
--network=host --ipc=host -it --privileged \
- ccr-2vdh3abv-pub.cnc.bj.baidubce.com/device/paddle-gcu:topsrider3.2.109-ubuntu20-x86_64-gcc84 /bin/bash
+ ccr-2vdh3abv-pub.cnc.bj.baidubce.com/device/topsrider3.4.623-ubuntu20-x86_64-gcc84 /bin/bash
```
#### 选项说明及可调整参数
@@ -78,25 +78,24 @@ efsmi
```bash
# 先安装飞桨 CPU 安装包
-python -m pip install paddlepaddle -i https://www.paddlepaddle.org.cn/packages/nightly/cpu
+python -m pip install paddlepaddle==3.1.0a0 -i https://www.paddlepaddle.org.cn/packages/stable/cpu/
# 再安装飞桨 GCU 插件包
-python -m pip install paddle-custom-gcu -i https://www.paddlepaddle.org.cn/packages/nightly/gcu
+python -m pip install paddle-custom-gcu==3.1.0 -i https://www.paddlepaddle.org.cn/packages/stable/gcu/
```
-⚠️ 注意:nightly 版本为每日构建,可能存在不稳定性。如果需要更稳定的版本,建议使用 3.0-rc 版本。
### 安装方式二:源代码编译安装
在启动的 docker 容器中,先安装飞桨 CPU 安装包,再下载 PaddleCustomDevice 源码编译得到飞桨 GCU 插件包。
```bash
# 下载 PaddleCustomDevice 源码
-git clone https://github.com/PaddlePaddle/PaddleCustomDevice
+git clone https://github.com/PaddlePaddle/PaddleCustomDevice -b release/3.1
# 进入硬件后端(燧原 GCU)目录
cd PaddleCustomDevice/backends/gcu
# 先安装飞桨 CPU 安装包
-python -m pip install paddlepaddle -i https://www.paddlepaddle.org.cn/packages/nightly/cpu
+python -m pip install paddlepaddle==3.1.0a0 -i https://www.paddlepaddle.org.cn/packages/stable/cpu/
# 执行编译命令 - submodule 在编译时会按需下载
mkdir -p build && cd build
@@ -118,7 +117,7 @@ python -c "import paddle_custom_device; paddle_custom_device.gcu.version()"
```
```bash
# 预期得到如下输出结果
-version: 3.0.0.dev20241206
+version: 3.1.0
commit: 7a2766768cc92aa94cc3d0ea6c23e8397f15f68a
TopsPlatform: 1.2.0.301
....
diff --git a/docs/hardware_support/hardware_info_cn.md b/docs/hardware_support/hardware_info_cn.md
index 92b38f5b375..8e6472c7315 100644
--- a/docs/hardware_support/hardware_info_cn.md
+++ b/docs/hardware_support/hardware_info_cn.md
@@ -17,6 +17,20 @@
| AI 加速芯片 | | 壁仞 | BR100、BR104 | | [源码编译](https://github.com/PaddlePaddle/PaddleCustomDevice/blob/develop/backends/biren_gpu/README_cn.md) |[代码仓库](https://github.com/PaddlePaddle/PaddleCustomDevice) |
| AI 加速芯片 | | 燧原 | 云燧 T20 、i20、S60 | | [源码编译](https://github.com/PaddlePaddle/PaddleCustomDevice/blob/develop/backends/gcu/README_cn.md) |[代码仓库](https://github.com/PaddlePaddle/PaddleCustomDevice) |
| AI 加速芯片 | | 太初 | 元碁系列 | [安装](./sdaa/install_cn.html#wheel) | [源码编译](https://github.com/PaddlePaddle/PaddleCustomDevice/blob/develop/backends/sdaa/README_cn.md) |[代码仓库](https://github.com/PaddlePaddle/PaddleCustomDevice) |
+| AI 加速芯片 | | 沐曦 | 曦云 C 系列 | [安装](./metax/install_cn.md#wheel) | [源码编译](https://github.com/PaddlePaddle/PaddleCustomDevice/blob/develop/backends/metax_gpu/README_cn.md) |[代码仓库](https://github.com/PaddlePaddle/PaddleCustomDevice) |
+
+## FastDeploy
+
+|分类|架构|公司|型号|使用指南|
+|-|-|-|-|-|
+| AI 加速卡 | | NVIDIA | Ada Lovelace、Hopper、 Ampere 架构 | [使用指南](https://github.com/PaddlePaddle/FastDeploy) |
+| AI 加速卡 | XPU | 昆仑芯 | P800 | [使用指南](https://github.com/PaddlePaddle/FastDeploy/tree/develop/docs/get_started/installation/kunlunxin_xpu.md) |
+| AI 加速卡 | | 燧原 | S60 | [使用指南](https://github.com/PaddlePaddle/FastDeploy/tree/develop/docs/get_started/installation/Enflame_gcu.md) |
+| AI 加速卡 | GPGPU | 天数 | 天垓 150 | [使用指南](https://github.com/PaddlePaddle/FastDeploy/tree/develop/docs/get_started/installation/iluvatar_gpu.md) |
+| AI 加速卡 | GPGPU | 海光 | K100_AI | 适配中 |
+| AI 加速卡 | 达芬奇 | 昇腾 | 910 系列 | 适配中 |
+| AI 加速卡 | GPGPU | 沐曦 | 曦云 C 系列 | 适配中 |
+
## Paddle Inference
@@ -67,7 +81,7 @@
| AI 加速芯片 | 海飞科 | Compass C10 | ✔️ | [模型库](https://github.com/hexaflakeai/model_zoo) |
| AI 加速芯片 | 清微智能 | TX5368 | ✔️ | [模型库](https://github.com/tsingmicro-toolchain/ts.knight-modelzoo) |
| AI 加速芯片 | 爱芯元智 | AX620A | ✔️ | [模型库](https://github.com/AXERA-TECH/ax-samples/tree/main) |
-| AI 加速芯片 | 沐曦 | N100 | ✔️ | [模型库](https://github.com/denglin-github/DLPaddleModelZoo) |
+| AI 加速芯片 | 沐曦 | N100 | ✔️ | [模型库](https://gitee.com/metax-maca/modelzoo/tree/master/paddlepaddle) |
| AI 加速芯片 | 希姆计算 | STCP920 | ✔️ | [模型库](https://github.com/Stream-Computing/STCPaddleModelZoo) |
## TVM
diff --git a/docs/hardware_support/iluvatar_gpu/index_cn.rst b/docs/hardware_support/iluvatar_gpu/index_cn.rst
new file mode 100644
index 00000000000..04e0b2e4070
--- /dev/null
+++ b/docs/hardware_support/iluvatar_gpu/index_cn.rst
@@ -0,0 +1,16 @@
+.. _cn_iluvatar_information:
+
+####################
+天数 GPGPU 芯片
+####################
+
+天数 BI150 加速卡([了解天数智芯](https://www.iluvatar.com/))是基于天数智芯自研通用 GPU 的训推一体加速卡,具备广通用性、强灵活性、高性价比的显著优势,支持市场主流生态,可广泛应用于主流大模型的预训练、微调以及推理任务,以及通用计算、新算法研究等场景,赋能 AI 智能社会。
+
+飞桨框架支持基于天数 GPGPU 芯片的训练和推理,请参考以下内容快速体验:
+
+- `天数 GPGPU 安装说明 <./install_cn.html>`_ : 天数 GPGPU 安装说明
+
+.. toctree::
+ :hidden:
+
+ install_cn.md
diff --git a/docs/hardware_support/iluvatar_gpu/install_cn.md b/docs/hardware_support/iluvatar_gpu/install_cn.md
new file mode 100644
index 00000000000..26bce9f6e78
--- /dev/null
+++ b/docs/hardware_support/iluvatar_gpu/install_cn.md
@@ -0,0 +1,151 @@
+# 天数 GPGPU 安装说明
+
+飞桨框架 iluvatar_gpu 版支持天数 GPGPU 的训练和推理,提供两种安装方式:
+
+1. 通过飞桨官网发布的 wheel 包安装
+2. 通过源代码编译得到 wheel 包安装
+
+## 天数 GPGPU 系统要求
+
+| 要求类型 | 要求内容 |
+| --------- | -------- |
+| 芯片型号 | 天数智芯 系列芯片,包括 BI150 |
+| 操作系统 | Linux 操作系统,包括 CentOS、Ubuntu、KylinV10 等 |
+
+## 运行环境准备
+
+推荐使用天数官方发布的天数 IXUCA 开发镜像,该镜像预装有天数 IXUCA 基础运行环境库。
+
+```bash
+# 拉取镜像
+docker pull ccr-2vdh3abv-pub.cnc.bj.baidubce.com/device/paddle-ixuca:latest
+```
+
+```bash
+# 在 host 上安装 driver
+wget https://ai-rank.bj.bcebos.com/Iluvatar/corex-driver-linux64-4.3.0.rc.9.20250624_x86_64_10.2.run
+bash corex-driver-linux64-4.3.0.rc.9.20250624_x86_64_10.2.run
+```
+
+```bash
+# 启动容器
+docker run -itd --name paddle-ixuca-dev -v /usr/src:/usr/src -v /lib/modules:/lib/modules \
+ -v /dev:/dev -v /home:/home --privileged --cap-add=ALL --pid=host \
+ ccr-2vdh3abv-pub.cnc.bj.baidubce.com/device/paddle-ixuca:latest
+docker exec -it paddle-ixuca-dev bash
+```
+
+#### 选项说明及可调整参数
+
+##### ① `--name paddle-ixuca-dev`
+- **作用**:指定容器名称。
+- **可调整**:
+ - 用户可改为其他名称,例如 `paddle-ixuca-test`,方便区分不同实验。
+
+```bash
+# 检查容器内是否正常识别天数 GPGPU 设备
+ixsmi
+```
+
+```bash
+# 预期输出
++-----------------------------------------------------------------------------+
+| IX-ML: 4.3.0 Driver Version: 4.3.0 CUDA Version: 10.2 |
+|-------------------------------+----------------------+----------------------|
+| GPU Name | Bus-Id | Clock-SM Clock-Mem |
+| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
+|===============================+======================+======================|
+| 0 Iluvatar BI-V150 | 00000000:10:00.0 | 1500MHz 1600MHz |
+| N/A 40C P0 N/A / N/A | 64MiB / 32768MiB | 0% Default |
++-------------------------------+----------------------+----------------------+
+| 1 Iluvatar BI-V150 | 00000000:13:00.0 | 1500MHz 1600MHz |
+| N/A 39C P0 104W / 350W | 64MiB / 32768MiB | 0% Default |
++-------------------------------+----------------------+----------------------+
+
++-----------------------------------------------------------------------------+
+| Processes: GPU Memory |
+| GPU PID Process name Usage(MiB) |
+|=============================================================================|
+| No running processes found |
++-----------------------------------------------------------------------------+
+
+```
+
+## 安装飞桨框架
+
+### 安装方式一:wheel 包安装
+
+iluvatar-gpu 支持插件式安装,需先安装飞桨 CPU 安装包,再安装飞桨 iluvatar-gpu 插件包。在启动的 docker 容器中,执行以下命令:
+
+```bash
+# 先安装飞桨 CPU 安装包
+python -m pip install paddlepaddle==3.1.0a0 -i https://www.paddlepaddle.org.cn/packages/stable/cpu/
+
+# 再安装飞桨 iluvatar-gpu 插件包
+python -m pip install paddle-iluvatar-gpu==3.1.0 -i https://www.paddlepaddle.org.cn/packages/stable/ixuca/
+```
+### 安装方式二:源代码编译安装
+
+在启动的 docker 容器中,先安装飞桨 CPU 安装包,再下载 PaddleCustomDevice 源码编译得到飞桨 iluvatar-gpu 插件包。
+
+```bash
+# 下载 PaddleCustomDevice 源码
+git clone https://github.com/PaddlePaddle/PaddleCustomDevice
+
+# 在 PaddleCUstomDevice 根目录下执行以下指令更新子模块代码
+git submodule sync
+git submodule update --init --recursive
+
+# 进入硬件后端(天数 iluvatar_gpu)目录
+cd backends/iluvatar_gpu
+
+# 先安装飞桨 CPU 安装包
+python -m pip install paddlepaddle==3.1.0a0 -i https://www.paddlepaddle.org.cn/packages/stable/cpu/
+
+# 安装编译所需依赖
+cd /tmp
+wget https://github.com/protocolbuffers/protobuf/releases/download/v21.12/protoc-21.12-linux-x86_64.zip
+unzip protoc-21.12-linux-x86_64.zip
+mv bin/protoc /usr/local/bin/
+rm -rf protoc-21.12-linux-x86_64.zip include bin
+cd -
+
+pip install --upgrade setuptools wheel
+
+# 执行编译脚本
+bash build_paddle.sh
+
+# 编译产出在 build_pip 路径下,使用安装脚本进行安装
+bash install_paddle.sh
+```
+## 基础功能检查
+
+安装完成后,在 docker 容器中输入如下命令进行飞桨基础健康功能的检查。
+
+```bash
+# 列出可用硬件后端
+python3 -c "import paddle; print(paddle.device.get_all_custom_device_type())"
+```
+```bash
+# 预期得到如下输出结果
+['iluvatar_gpu']
+```
+```bash
+# 使用 paddle utils 模块的 `run_check` 功能检查 paddle_iluvatar_gpu 插件和 PaddlePaddle 主框架是否正常安装,需要指定 xccl 的后端为 iluvatar_gpu
+export PADDLE_XCCL_BACKEND=iluvatar_gpu
+python3 -c "import paddle; paddle.utils.run_check()"
+```
+```bash
+# 预期得到输出如下
+Running verify PaddlePaddle program ...
+PaddlePaddle works well on 1 iluvatar_gpu.
+PaddlePaddle works well on 16 iluvatar_gpus.
+PaddlePaddle is installed successfully! Let's start deep learning with PaddlePaddle now.
+```
+## 如何卸载
+
+请使用以下命令卸载 Paddle:
+
+```bash
+pip uninstall paddlepaddle paddle-iluvatar-gpu
+```
diff --git a/docs/hardware_support/metax/index_cn.rst b/docs/hardware_support/metax/index_cn.rst
new file mode 100644
index 00000000000..88d9ff08502
--- /dev/null
+++ b/docs/hardware_support/metax/index_cn.rst
@@ -0,0 +1,18 @@
+.. _cn_metax_information:
+
+####################
+METAX GPGPU 芯片
+####################
+
+沐曦曦云 C 系列芯片是沐曦推出的一款高性能通用人工智能计算芯片,曦云 C 系列通用 GPU(GPGPU)芯片是针对智算及通用计算的完美解决方案,沐曦自主知识产权架构提供强大高精度及多精度混合算力,可广泛应用于智算以及通用计算、教育和科研等场景。`点击这里 `_ 。
+
+飞桨框架支持基于沐曦曦云芯片的训练和推理,请参考以下内容快速体验:
+
+- `沐曦 曦云 C 系列 安装说明 <./install_cn.html>`_ : 沐曦 曦云 C 系列 安装说明
+- `沐曦 曦云 C 系列 基于框架的使用指南 <./paddle_tutorial_cn.html>`_ : 沐曦 曦云 C 系列 基于框架的使用指南
+
+.. toctree::
+ :hidden:
+
+ install_cn.md
+ paddle_tutorial_cn.md
diff --git a/docs/hardware_support/metax/install_cn.md b/docs/hardware_support/metax/install_cn.md
new file mode 100644
index 00000000000..fd9023b1557
--- /dev/null
+++ b/docs/hardware_support/metax/install_cn.md
@@ -0,0 +1,60 @@
+
+# 沐曦 曦云 C 系列 安装说明
+
+飞桨框架 MACA 版支持基于沐曦 MACA 软件栈 的训练和推理,提供两种安装方式:
+
+1. 通过飞桨官网发布的 wheel 包安装
+2. 通过源代码编译安装得到 wheel 包
+
+## 沐曦 曦云 C 系列 系统要求
+
+| 要求类型 | 要求内容 |
+| --------- | -------- |
+| 芯片型号 | 沐曦曦云 C 系列芯片,包括 C500 |
+| 操作系统 | Linux 操作系统,包括 CentOS、Ubuntu、KylinV10 等 |
+
+
+## 安装飞桨框架
+### 安装方式一:wheel 包安装
+沐曦曦云 C500 支持插件式安装,需先安装飞桨 CPU 安装包,再安装飞桨 沐曦 插件包:
+```bash
+# 先安装飞桨 CPU 安装包
+python -m pip install paddlepaddle==3.1.0a0 -i https://www.paddlepaddle.org.cn/packages/stable/cpu/
+# 再安装飞桨 曦云 C500 插件包
+python -m pip install paddle-metax-gpu==3.1.0 -i https://www.paddlepaddle.org.cn/packages/stable/maca/
+```
+
+### 安装方式二:源代码编译安装
+在启动的 docker 容器中,先安装飞桨 CPU 安装包,再下载 PaddleCustomDevice 源码编译得到飞桨 C500 插件包。
+
+```bash
+# 下载 PaddleCustomDevice 源码
+git clone https://github.com/PaddlePaddle/PaddleCustomDevice -b release/3.1
+
+# 在 PaddleCUstomDevice 根目录下执行以下指令更新子模块代码
+git submodule sync
+git submodule update --init --recursive
+
+# 进入硬件后端(沐曦 曦云 C500)目录
+cd backends/metax_gpu
+
+# 先安装飞桨 CPU 安装包
+python -m pip install paddlepaddle==3.1.0a0 -i https://www.paddlepaddle.org.cn/packages/stable/cpu/
+
+# 编译安装
+bash build_in_metax.sh
+# 或者
+bash change_patch.sh #只执行一次
+bash compile.sh # 可执行多次
+
+# 编译产出在 build/dist 路径下,使用 pip 安装
+pip install build/dist/*.whl --force-reinstall
+
+```
+## 如何卸载
+
+请使用以下命令卸载 Paddle:
+
+```bash
+pip uninstall paddlepaddle paddle-metax-gpu
+```
diff --git a/docs/hardware_support/metax/paddle_tutorial_cn.md b/docs/hardware_support/metax/paddle_tutorial_cn.md
new file mode 100644
index 00000000000..0e98bb02c4e
--- /dev/null
+++ b/docs/hardware_support/metax/paddle_tutorial_cn.md
@@ -0,0 +1,34 @@
+
+
+# 沐曦 曦云 C500 基于 PaddlePaddle 框架的使用指南
+
+## 一、环境准备
+
+### 环境说明
+
+* 本教程介绍如何基于沐曦 曦云 C500 进行安装使用
+
+* 考虑到环境差异性,我们推荐使用教程提供的标准镜像完成环境准备:
+
+ * x86_64 镜像链接:您可以联系 MetaX 或访问 https://sw-download.metax-tech.com 获取对应的镜像文件。
+
+ * 镜像中已经默认安装了沐曦 MACA 软件栈
+
+
+### 环境安装
+
+1. 安装 PaddlePaddle
+
+*该命令会自动安装飞桨主框架自动构建的 release-3.1-build 版本*
+
+```shell
+python -m pip install paddlepaddle==3.1.0a0 -i https://www.paddlepaddle.org.cn/packages/stable/cpu/
+```
+
+2. 安装 CustomDevice
+
+*该命令会自动安装飞桨 Custom Device 自动构建的 release-3.1-build 版本*
+
+```shell
+python -m pip install paddle-metax-gpu==3.1.0 -i https://www.paddlepaddle.org.cn/packages/stable/maca/
+```
diff --git a/docs/hardware_support/mlu/install_cn.md b/docs/hardware_support/mlu/install_cn.md
index e656681bb9c..352c89603b2 100644
--- a/docs/hardware_support/mlu/install_cn.md
+++ b/docs/hardware_support/mlu/install_cn.md
@@ -89,25 +89,24 @@ cnmon
```bash
# 先安装飞桨 CPU 安装包
-pip install paddlepaddle -i https://www.paddlepaddle.org.cn/packages/nightly/cpu
+python -m pip install paddlepaddle==3.1.0 -i https://www.paddlepaddle.org.cn/packages/stable/cpu/
# 再安装飞桨 MLU 插件包
-pip install paddle-custom-mlu -i https://www.paddlepaddle.org.cn/packages/nightly/mlu
+python -m pip install paddle-custom-mlu==3.1.0 -i https://www.paddlepaddle.org.cn/packages/stable/mlu/
```
-⚠️ 注意:nightly 版本为每日构建,可能存在不稳定性。如果需要更稳定的版本,建议使用 3.0-rc 版本。
### 安装方式二:源代码编译安装
在启动的 docker 容器中,先安装飞桨 CPU 安装包,再下载 PaddleCustomDevice 源码编译得到飞桨 MLU 插件包。
```bash
# 下载 PaddleCustomDevice 源码
-git clone https://github.com/PaddlePaddle/PaddleCustomDevice
+git clone https://github.com/PaddlePaddle/PaddleCustomDevice -b release/3.1
# 进入硬件后端(寒武纪 MLU)目录
cd PaddleCustomDevice/backends/mlu
# 先安装飞桨 CPU 安装包
-pip install paddlepaddle -i https://www.paddlepaddle.org.cn/packages/nightly/cpu
+python -m pip install paddlepaddle==3.1.0 -i https://www.paddlepaddle.org.cn/packages/stable/cpu/
# 执行编译脚本 - submodule 在编译时会按需下载
bash tools/compile.sh
@@ -115,7 +114,6 @@ bash tools/compile.sh
# 飞桨 MLU 插件包在 build/dist 路径下,使用 pip 安装即可
pip install build/dist/paddle_custom_mlu*.whl
```
-⚠️ 注意:nightly 版本为每日构建,可能存在不稳定性。如果需要更稳定的版本,建议使用 3.0-rc 版本。
## 基础功能检查
安装完成后,在 docker 容器中输入如下命令进行飞桨基础健康功能的检查。
@@ -126,7 +124,7 @@ python -c "import paddle_custom_device; paddle_custom_device.mlu.version()"
```
```bash
# 预期得到如下输出结果
-version: 0.0.0
+version: 3.1.0
commit: 147d506b2baa1971ab47b4550f0571e1f6b201fc
cntoolkit: 3.8.2
cnnl: 1.23.2
diff --git a/docs/hardware_support/npu/install_cn.md b/docs/hardware_support/npu/install_cn.md
index a944fd99afd..a4f4744a623 100644
--- a/docs/hardware_support/npu/install_cn.md
+++ b/docs/hardware_support/npu/install_cn.md
@@ -94,25 +94,24 @@ npu-smi info
```bash
# 先安装飞桨 CPU 安装包
-pip install paddlepaddle -i https://www.paddlepaddle.org.cn/packages/nightly/cpu
+python -m pip install paddlepaddle==3.1.0a0 -i https://www.paddlepaddle.org.cn/packages/stable/cpu/
# 再安装飞桨 NPU 插件包
-pip install paddle-custom-npu -i https://www.paddlepaddle.org.cn/packages/nightly/npu
+python -m pip install paddle-custom-npu==3.1.0 -i https://www.paddlepaddle.org.cn/packages/stable/npu/
```
-⚠️ 注意:nightly 版本为每日构建,可能存在不稳定性。如果需要更稳定的版本,建议使用 3.0-rc 版本。
### 安装方式二:源代码编译安装
在启动的 docker 容器中,先安装飞桨 CPU 安装包,再下载 PaddleCustomDevice 源码编译得到飞桨 NPU 插件包。
```bash
# 下载 PaddleCustomDevice 源码
-git clone https://github.com/PaddlePaddle/PaddleCustomDevice
+git clone https://github.com/PaddlePaddle/PaddleCustomDevice -b release/3.1
# 进入硬件后端(昇腾 NPU)目录
cd PaddleCustomDevice/backends/npu
# 先安装飞桨 CPU 安装包
-pip install paddlepaddle -i https://www.paddlepaddle.org.cn/packages/nightly/cpu
+python -m pip install paddlepaddle==3.1.0a0 -i https://www.paddlepaddle.org.cn/packages/stable/cpu/
# 执行编译脚本 - submodule 在编译时会按需下载
bash tools/compile.sh
@@ -120,7 +119,6 @@ bash tools/compile.sh
# 飞桨 NPU 插件包在 build/dist 路径下,使用 pip 安装即可
pip install build/dist/paddle_custom_npu*.whl
```
-⚠️ 注意:nightly 版本为每日构建,可能存在不稳定性。如果需要更稳定的版本,建议使用 3.0-rc 版本。
## 基础功能检查
安装完成后,在 docker 容器中输入如下命令进行飞桨基础健康功能的检查。
@@ -131,7 +129,7 @@ python -c "import paddle_custom_device; paddle_custom_device.npu.version()"
```
```bash
# 预期得到如下输出结果
-version: 0.0.0
+version: 3.1.0
commit: 147d506b2baa1971ab47b4550f0571e1f6b201fc
cann: 8.0.RC2
....
diff --git a/docs/hardware_support/sdaa/install_cn.md b/docs/hardware_support/sdaa/install_cn.md
index f1c90af581d..0de112e59d7 100644
--- a/docs/hardware_support/sdaa/install_cn.md
+++ b/docs/hardware_support/sdaa/install_cn.md
@@ -83,19 +83,18 @@ SDAA 支持插件式安装,需先安装飞桨 CPU 安装包,再安装飞桨
```bash
# 先安装飞桨 CPU 安装包
-pip install paddlepaddle -i https://www.paddlepaddle.org.cn/packages/nightly/cpu
+python -m pip install paddlepaddle==3.1.0 -i https://www.paddlepaddle.org.cn/packages/stable/cpu/
# 再安装飞桨 SDAA 插件包
-pip install paddle-sdaa -i https://www.paddlepaddle.org.cn/packages/nightly/sdaa
+python -m pip install paddle-sdaa==3.1.0 -i https://www.paddlepaddle.org.cn/packages/stable/sdaa/
```
-⚠️ 注意:nightly 版本为每日构建,可能存在不稳定性。如果需要更稳定的版本,建议使用 3.0 版本:https://www.paddlepaddle.org.cn/packages/stable/sdaa/paddle-sdaa/
### 安装方式二:源代码编译安装
在启动的 docker 容器中,先安装飞桨 CPU 安装包,再下载 PaddleCustomDevice 源码编译得到飞桨 SDAA 插件包。
```bash
# 下载 PaddleCustomDevice 源码
-git clone https://github.com/PaddlePaddle/PaddleCustomDevice
+git clone https://github.com/PaddlePaddle/PaddleCustomDevice -b release/3.1
# 在 PaddleCUstomDevice 根目录下执行以下指令更新子模块代码
git submodule sync
@@ -105,7 +104,7 @@ git submodule update --init --recursive
cd backends/sdaa
# 先安装飞桨 CPU 安装包
-pip install paddlepaddle -i https://www.paddlepaddle.org.cn/packages/nightly/cpu
+python -m pip install paddlepaddle==3.1.0 -i https://www.paddlepaddle.org.cn/packages/stable/cpu/
# 执行编译脚本
bash compile.sh
@@ -113,7 +112,6 @@ bash compile.sh
# 编译产出在 build/dist 路径下,使用 pip 安装
pip install build/dist/*.whl --force-reinstall
```
-⚠️ 注意:nightly 版本为每日构建,可能存在不稳定性。如果需要更稳定的版本,建议使用 3.0 版本。
## 基础功能检查
安装完成后,在 docker 容器中输入如下命令进行飞桨基础健康功能的检查。
diff --git a/docs/hardware_support/xpu/index_cn.rst b/docs/hardware_support/xpu/index_cn.rst
index e81bdc05a6c..ad9b9adf3f5 100644
--- a/docs/hardware_support/xpu/index_cn.rst
+++ b/docs/hardware_support/xpu/index_cn.rst
@@ -12,10 +12,6 @@
飞桨框架支持基于昆仑芯 XPU 芯片的训练和推理,请参考以下内容快速体验:
-- `昆仑芯 XPU 安装说明 <./xpu-gen2_install_cn.html>`_: 昆仑芯 XPU 二代芯片安装说明
-- `昆仑芯 XPU 基于框架的使用指南 <./xpu-gen2_paddle_tutorial_cn.html>`_ : 昆仑芯 XPU 二代芯片基于框架的使用指南
-- `昆仑芯 XPU 基于套件的使用指南 <./xpu-gen2_paddlex_tutorial_cn.html>`_ : 昆仑芯 XPU 二代芯片基于套件的使用指南
-- `昆仑芯 XPU 支持模型 <./xpu-gen2_support_cn.html>`_ : 昆仑芯 XPU 二代芯片支持模型
- `昆仑芯 XPU 安装说明 <./xpu-p800_install_cn.html>`_: 昆仑芯 XPU P800 安装说明
- `昆仑芯 XPU 基于框架的使用指南 <./xpu-p800_paddle_tutorial_cn.html>`_ : 昆仑芯 XPU P800 基于框架的使用指南
- `昆仑芯 XPU 基于套件的使用指南 <./xpu-p800_paddlex_tutorial_cn.html>`_ : 昆仑芯 XPU P800 基于套件的使用指南
@@ -24,10 +20,6 @@
.. toctree::
:hidden:
- xpu-gen2_install_cn.md
- xpu-gen2_paddle_tutorial_cn.md
- xpu-gen2_paddlex_tutorial_cn.md
- xpu-gen2_support_cn.md
xpu-p800_install_cn.md
xpu-p800_paddle_tutorial_cn.md
xpu-p800_paddlex_tutorial_cn.md
diff --git a/docs/hardware_support/xpu/xpu-gen2_install_cn.md b/docs/hardware_support/xpu/xpu-gen2_install_cn.md
deleted file mode 100644
index e9406376acd..00000000000
--- a/docs/hardware_support/xpu/xpu-gen2_install_cn.md
+++ /dev/null
@@ -1,154 +0,0 @@
-# 昆仑芯 XPU 安装说明
-
-飞桨框架 XPU 版支持昆仑芯 XPU 的训练和推理,提供两种安装方式:
-
-1. 通过飞桨官网发布的 wheel 包安装
-2. 通过源代码编译安装得到 wheel 包
-
-## 昆仑芯 XPU 系统要求
-
-| 要求类型 | 要求内容 |
-| --------- | -------- |
-| 芯片型号 | 昆仑芯 2 代,包括 R200、R300、R200-8F、RG800 |
-| 操作系统 | Linux 操作系统,包括 Ubuntu、CentOS、KylinV10 |
-
-**注意**:当前教程适用于『昆仑芯』二代芯片。查看芯片类型请参考如下命令:
-
-```bash
-# 系统环境下运行如下命令,如果有设备列表输出,且字段为 3684,则说明芯片为昆仑芯二代芯片
-lspci -d 1d22: -n
-```
-
-## 运行环境准备
-
-推荐使用飞桨官方发布的昆仑芯 XPU 开发镜像,该镜像预装有昆仑芯基础运行环境库(XRE)。
-
-```bash
-# 拉取镜像
-docker pull ccr-2vdh3abv-pub.cnc.bj.baidubce.com/device/paddle-xpu:ubuntu20-x86_64-gcc84-py310
-```
-```bash
-# 参考如下命令,启动容器
-docker run -it --name paddle-xpu-dev -v $(pwd):/work \
- -w=/work --shm-size=128G --network=host --privileged \
- --cap-add=SYS_PTRACE --security-opt seccomp=unconfined \
- ccr-2vdh3abv-pub.cnc.bj.baidubce.com/device/paddle-xpu:ubuntu20-x86_64-gcc84-py310 /bin/bash
-```
-#### 选项说明及可调整参数
-
-##### ① `--name paddle-xpu-dev`
-- **作用**:指定容器名称。
-- **可调整**:
- - 用户可改为其他名称,例如 `paddle-xpu-test`,方便区分不同实验。
-
-##### ② `-v $(pwd):/work`
-- **作用**:挂载本地目录到容器内 `/work` 目录。
-- **可调整**:
- - 可以修改 `$(pwd)` 为实际路径,例如 `-v /data/projects:/work`,让容器访问宿主机的数据。
-
-##### ③ `--shm-size=128G`
-- **作用**:设置共享内存大小,影响数据处理和计算效率。
-- **可调整**:
- - 若内存有限,可降低,如 `--shm-size=32G`,但可能影响大规模训练。
- - 若训练任务需要更大共享内存,可提高,如 `--shm-size=256G`。
-```bash
-# 检查容器内是否可以正常识别昆仑芯 XPU 设备
-xpu_smi
-```
-```bash
-# 预期得到输出如下
-Runtime Version: 4.31
-Driver Version: 4.0
- DEVICES
--------------------------------------------------------------------------------------------
-| DevID | PCI Addr | Model | SN | INODE | UseRate | L3 | Memory |
--------------------------------------------------------------------------------------------
-| 0 | 0000:53:00.0 | R300 | 02Kxxx | /dev/xpu0 | 0 % | 0 / 63 MB | 0 / 32768 MB |
-| 1 | 0000:56:00.0 | R300 | 02Kxxx | /dev/xpu1 | 0 % | 0 / 63 MB | 0 / 32768 MB |
--------------------------------------------------------------------------------------------
- VIDEO
------------------------------------------------------------------------------------
-| DevID | Model | DEC | ENC | IMGPROC |
------------------------------------------------------------------------------------
-| 0 | R300 | 0 %, 0 fps, 800 MHz | 0 %, 0 fps, 800 MHz | 0 %, 0 fps, 800 MHz |
-| 1 | R300 | 0 %, 0 fps, 800 MHz | 0 %, 0 fps, 800 MHz | 0 %, 0 fps, 800 MHz |
------------------------------------------------------------------------------------
- PROCESSES
--------------------------------------------------
-| DevID | PID | Streams | L3 | Memory | Command |
--------------------------------------------------
--------------------------------------------------
-```
-
-## 安装飞桨框架
-
-**注意**:当前飞桨 develop 分支仅支持 X86 架构,如需昆仑芯 XPU 的 ARM 架构支持,请切换到 [release/2.6](https://www.paddlepaddle.org.cn/documentation/docs/zh/2.6/guides/hardware_support/xpu/install_cn.html) 分支。
-
-### 安装方式一:wheel 包安装
-
-在启动的 docker 容器中,下载并安装飞桨官网发布的 wheel 包。
-
-```bash
-# 下载并安装 wheel 包
-pip install paddlepaddle-xpu -i https://www.paddlepaddle.org.cn/packages/nightly/xpu
-```
-⚠️ 注意:nightly 版本为每日构建,可能存在不稳定性。如果需要更稳定的版本,建议使用 3.0-rc 版本。
-### 安装方式二:源代码编译安装
-
-在启动的 docker 容器中,下载 Paddle 源码并编译,CMAKE 编译选项含义请参见[编译选项表](https://www.paddlepaddle.org.cn/documentation/docs/zh/develop/install/Tables.html#Compile)。
-
-```bash
-# 下载 Paddle 源码
-git clone https://github.com/PaddlePaddle/Paddle.git -b develop
-cd Paddle
-
-# 创建编译目录
-mkdir build && cd build
-
-# cmake 编译命令
-cmake .. -DCMAKE_EXPORT_COMPILE_COMMANDS=ON -DCMAKE_CXX_FLAGS="-Wno-error -w" \
- -DPY_VERSION=3.10 -DPYTHON_EXECUTABLE=`which python3` -DWITH_CUSTOM_DEVICE=OFF \
- -DWITH_TESTING=OFF -DON_INFER=ON -DWITH_DISTRIBUTE=ON -DWITH_ARM=OFF \
- -DWITH_XPU=ON -DWITH_XPU_BKCL=ON -DWITH_UBUNTU=ON
-
-# make 编译命令
-make -j16
-
-# 编译产出在 build/python/dist/ 路径下,使用 pip 安装即可
-pip install -U paddlepaddle_xpu-0.0.0-cp310-cp310-linux_x86_64.whl
-```
-⚠️ 注意:nightly 版本为每日构建,可能存在不稳定性。如果需要更稳定的版本,建议使用 3.0-rc 版本。
-## 基础功能检查
-
-安装完成后,在 docker 容器中输入如下命令进行飞桨基础健康功能的检查。
-
-```bash
-# 检查当前安装版本
-python -c "import paddle; paddle.version.show()"
-```
-```bash
-# 预期得到输出如下
-commit: 84425362060e126b066a5a0f0d29ae2e2218a834
-xpu: 20240104
-xpu_xccl: 1.1.8.1
-xpu_xhpc: 20240312
-```
-```bash
-# 飞桨基础健康检查
-python -c "import paddle; paddle.utils.run_check()"
-```
-```bash
-# 预期得到输出如下
-Running verify PaddlePaddle program ...
-PaddlePaddle works well on 1 XPU.
-PaddlePaddle works well on 8 XPUs.
-PaddlePaddle is installed successfully! Let's start deep learning with PaddlePaddle now.
-```
-
-## 如何卸载
-
-请使用以下命令卸载:
-
-```bash
-pip uninstall paddlepaddle-xpu
-```
diff --git a/docs/hardware_support/xpu/xpu-gen2_paddle_tutorial_cn.md b/docs/hardware_support/xpu/xpu-gen2_paddle_tutorial_cn.md
deleted file mode 100644
index df6a9f17f8a..00000000000
--- a/docs/hardware_support/xpu/xpu-gen2_paddle_tutorial_cn.md
+++ /dev/null
@@ -1,110 +0,0 @@
-# 昆仑芯 XPU 基于框架的使用指南
-
-## 一、环境准备
-
-### 环境说明
-
-* 本教程介绍如何基于昆仑芯 XPU 进行 ResNet50 的训练,总共需要 1 卡进行训练
-
-* 考虑到环境差异性,我们推荐使用教程提供的标准镜像完成环境准备:
-
- * 镜像链接: ccr-2vdh3abv-pub.cnc.bj.baidubce.com/device/paddle-xpu:ubuntu20-x86_64-gcc84-py310
-
-### 环境安装
-
-安装 PaddlePaddle
-
-*该命令会自动安装飞桨主框架每日自动构建的 nightly-build 版本*
-
-*由于 xpu 代码位于飞桨主框架中,因此我们不需要安装额外的 Custom Device 包*
-
-```shell
-python -m pip install paddlepaddle-xpu -i https://www.paddlepaddle.org.cn/packages/nightly/xpu/
-```
-⚠️ 注意:nightly 版本为每日构建,可能存在不稳定性。如果需要更稳定的版本,建议使用 3.0-rc 版本。
-## 二、运行示例
-
-飞桨框架集成了经典的视觉模型用于帮助用户快速上手,我们将基于 ResNet50 结构,在 Cifar10 数据集上进行一次快速训练,用于帮助您了解如何基于昆仑芯 XPU 进行训练(和 GPU 训练代码相比,差异点仅为 `paddle.set_device("xpu")`)
-
-注意:
-
-* *本教程主要用于快速入门,并未对参数进行细致调优,训练效果未必是最好的,您可以自行调整超参数进行效果调优*
-
-* *本教程预计使用单卡 R300 训练 40 分钟*
-
-1. 导入必要的包
-
-```python
-import paddle
-from paddle.vision import transforms
-from paddle.vision.models import resnet50
-```
-
-2. 设置运行设备
-
-```python
-# 1. 设定运行设备为 xpu
-paddle.set_device("xpu")
-```
-
-3. 加载训练数据集
-
-```python
-# 2. 定义数据集、数据预处理方法与 DataLoader
-transform = transforms.Compose([
- transforms.Resize(224),
- transforms.ToTensor(),
- transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5))
-])
-train_set = paddle.vision.datasets.Cifar10(mode='train', transform=transform)
-train_loader = paddle.io.DataLoader(train_set, batch_size=128, num_workers=8)
-```
-
-4. 定义网络结构和损失函数
-
-```python
-# 3. 定义网络结构
-net = resnet50(num_classes=10)
-# 4. 定义损失函数
-net_loss = paddle.nn.CrossEntropyLoss()
-# 5. 定义优化器
-optimizer = paddle.optimizer.Adam(learning_rate=0.001, parameters=net.parameters())
-```
-
-5. 启动训练
-
-训练过程中会打印 loss 的变化情况,可以观察到 loss 在初步下降,这意味着模型参数逐渐适应了该数据集。
-
-```python
-net.train()
-for epoch in range(10):
- for batch_idx, data in enumerate(train_loader, start=0):
- inputs, labels = data
- optimizer.clear_grad()
- # 6. 前向传播并计算损失
- outputs = net(inputs)
- loss = net_loss(outputs, labels)
- # 7. 反向传播
- loss.backward()
- # 8. 更新参数
- optimizer.step()
- print('Epoch %d, Iter %d, Loss: %.5f' % (epoch + 1, batch_idx + 1, loss))
-print('Finished Training')
-```
-
-6. 测试模型效果
-
-```python
-test_dataset = paddle.vision.datasets.Cifar10(mode='test', transform=transform)
-
-# 测试 5 张图片效果
-for i in range(5):
- test_image, gt = test_dataset[0]
- # CHW -> NCHW
- test_image = test_image.unsqueeze(0)
-
- # 取预测分布中的最大值
- res = net(test_image).argmax().numpy()
- print(f"图像{i} 标签:{gt}")
- print(f"模型预测结果:{res}")
-```
diff --git a/docs/hardware_support/xpu/xpu-gen2_paddlex_tutorial_cn.md b/docs/hardware_support/xpu/xpu-gen2_paddlex_tutorial_cn.md
deleted file mode 100644
index 51084a9ebba..00000000000
--- a/docs/hardware_support/xpu/xpu-gen2_paddlex_tutorial_cn.md
+++ /dev/null
@@ -1,136 +0,0 @@
-# 昆仑芯 XPU 基于 PaddleX 的使用指南
-
-## 环境准备
-
-### 环境说明
-
-* 本教程介绍如何基于昆仑芯 XPU 进行 ResNet50 的训练,总共需要 4 卡进行训练
-
-* 考虑到环境差异性,我们推荐使用教程提供的标准镜像完成环境准备:
-
- * 镜像链接: ccr-2vdh3abv-pub.cnc.bj.baidubce.com/device/paddle-xpu:ubuntu20-x86_64-gcc84-py310
-
-### 环境安装
-
-1. 安装 PaddlePaddle
-
-*该命令会自动安装飞桨主框架每日自动构建的 nightly-build 版本*
-
-*由于 xpu 代码位于飞桨主框架中,因此我们不需要安装额外的 Custom Device 包*
-
-```shell
-python -m pip install paddlepaddle-xpu -i https://www.paddlepaddle.org.cn/packages/nightly/xpu/
-```
-
-2. 安装 PaddleX 代码库
-
-```shell
-git clone https://github.com/PaddlePaddle/PaddleX.git
-
-# 如果速度较慢,可以考虑从 gitee 拉取
-# git clone https://gitee.com/paddlepaddle/PaddleX.git
-
-cd PaddleX
-
-# 安装 PaddleX whl
-# -e:以可编辑模式安装,当前项目的代码更改,都会直接作用到已经安装的 PaddleX Wheel
-pip install -e .
-```
-⚠️ 注意:nightly 版本为每日构建,可能存在不稳定性。如果需要更稳定的版本,建议使用 3.0-rc 版本。
-## 基于 PaddleX 训练 ResNet50
-
-### 一、安装 PaddleX 依赖
-
-```shell
-# 跳转到 PaddleX 根目录下
-cd /path/to/paddlex
-
-# 安装 PaddleX 相关依赖,由于我们使用的是图像分类模型,因此安装图像分类库
-paddlex --install PaddleClas
-
-# 完成安装后会有如下提示:
-# All packages are installed.
-```
-
-### 二、数据准备
-
-为了快速上手验证,我们基于 flowers 102 数据集进行快速体验:
-
-1. 下载数据集
-
-```shell
-# 跳转到 PaddleX 根目录下
-cd /path/to/paddlex
-
-# 下载并解压数据
-wget https://paddle-model-ecology.bj.bcebos.com/paddlex/data/cls_flowers_examples.tar -P ./dataset
-tar -xf ./dataset/cls_flowers_examples.tar -C ./dataset/
-```
-
-2. 数据校验
-
-```shell
-# PaddleX 支持对数据集进行校验,确保数据集格式符合 PaddleX 的相关要求。同时在数据校验时,能够对数据集进行分析,统计数据集的基本信息。
-python main.py -c paddlex/configs/image_classification/ResNet50.yaml \
- -o Global.mode=check_dataset \
- -o Global.dataset_dir=./dataset/cls_flowers_examples
-
-# 命令运行成功后会在 log 中打印出 Check dataset passed ! 信息
-```
-
-更多关于 PaddleX 数据集说明的内容,可以查看 [PaddleX 图像分类模块数据准备](https://github.com/PaddlePaddle/PaddleX/blob/release/3.0-beta1/docs/module_usage/tutorials/cv_modules/image_classification.md#41-%E6%95%B0%E6%8D%AE%E5%87%86%E5%A4%87)
-
-### 三、模型训练
-
-进入 `PaddleX` 目录下,执行如下命令启动 4 卡 XPU(0 ~ 3 号卡)训练,其中:
-
-* 参数 `-o Global.device` 指定的是即将运行的设备,这里需要传入的是 `xpu:0,1,2,3` ,通过指定该参数,PaddleX 调用飞桨的设备指定接口 `paddle.set_device` 来指定运行设备为 `xpu` ,在进行模型训练时,飞桨将自动调用 xpu 算子用于执行模型计算。关于设备指定的更多细节,可以参考官方 api [paddle.set_device](https://www.paddlepaddle.org.cn/documentation/docs/zh/api/paddle/device/set_device_cn.html#set-device)。
-
-* 参数 `-c paddlex/configs/modules/image_classification/ResNet50.yaml` 表示读取指定目录下的配置文件,配置文件中指定了模型结构,训练超参等所有训练模型需要用到的配置,该文件中指定的模型结构为 `ResNet50`
-
-```shell
-python main.py -c paddlex/configs/modules/image_classification/ResNet50.yaml \
- -o Global.mode=train \
- -o Global.dataset_dir=./dataset/cls_flowers_examples \
- -o Global.output=resnet50_output \
- -o Global.device="xpu:0,1,2,3"
-```
-
-上述命令会在 `PaddleX` 目录下产生一个 `resnet50_output/` 目录,该目录会存放训练过程中的模型参数
-
-### 四、模型推理
-
-#### 基于 PaddleInference 推理
-
-训练完成后,最优权重放在 `resnet50_output/best_model/` 目录下,其中 `inference/inference.pdiparams`、`inference/inference.pdiparams.info`、`inference/inference.pdmodel` 3 个文件为静态图文件,用于推理使用,使用如下命令进行推理
-
-```shell
-python main.py -c paddlex/configs/modules/image_classification/ResNet50.yaml \
- -o Global.mode=predict \
- -o Predict.model_dir="./resnet50_output/best_model/inference" \
- -o Predict.input="https://paddle-model-ecology.bj.bcebos.com/paddlex/imgs/demo_image/general_image_classification_001.jpg" \
- -o Global.device="xpu:0"
-```
-
-#### 转换 ONNX 模型
-
-如果您有额外的部署需求需要基于 ONNX 实现,我们也提供了专用的工具用于导出 ONNX 模型,参考如下步骤,即可将第一步导出的静态图模型转换为 ONNX 模型:
-
-a. 安装环境
-
-```shell
-# 安装 paddle2onnx,该工具支持将 PaddleInference 模型转换为 ONNX 格式
-python -m pip install paddle2onnx
-```
-
-b. 模型转换
-
-```shell
-paddle2onnx --model_dir=./resnet50_output/best_model/inference \
- --model_filename=inference.pdmodel \
- --params_filename=inference.pdiparams \
- --save_file=./resnet50_output/best_model/inference.onnx \
- --enable_onnx_checker=True
-```
-
-该命令会在 `resnet50_output/best_model` 目录下生成 `inference.onnx` 文件
diff --git a/docs/hardware_support/xpu/xpu-gen2_support_cn.md b/docs/hardware_support/xpu/xpu-gen2_support_cn.md
deleted file mode 100644
index 9500f652170..00000000000
--- a/docs/hardware_support/xpu/xpu-gen2_support_cn.md
+++ /dev/null
@@ -1,54 +0,0 @@
-# 昆仑芯 XPU 支持模型
-
-飞桨框架在昆仑芯 XPU 上通过精度验证的模型情况如下:
-
-* PaddleX 使用文档详见:[PaddleX 多硬件使用](https://github.com/PaddlePaddle/PaddleX/blob/release/3.0-beta1/docs/other_devices_support/multi_devices_use_guide.md)
-* PaddleNLP 大语言模型多硬件使用文档详见:[PaddleNLP XPU 大语言模型使用文档](https://github.com/PaddlePaddle/PaddleNLP/tree/develop/llm/xpu)
-* 如果您适配/验证过更多模型,欢迎向飞桨开源社区贡献适配代码,然后邮件联系我们更新本列表 [ext_paddle_oss](ext_paddle_oss@baidu.com)
-
-| 模型库 | 模型类型 | 模型名称 | 训练 | 推理 |
-| - | - | - | - | - |
-| PaddleX | 图像分类 | [ResNet18](https://github.com/PaddlePaddle/PaddleX/blob/develop/paddlex/configs/modules/image_classification/ResNet18.yaml) | √ | √ |
-| PaddleX | 图像分类 | [ResNet34](https://github.com/PaddlePaddle/PaddleX/blob/develop/paddlex/configs/modules/image_classification/ResNet34.yaml) | √ | √ |
-| PaddleX | 图像分类 | [ResNet50](https://github.com/PaddlePaddle/PaddleX/blob/develop/paddlex/configs/modules/image_classification/ResNet50.yaml) | √ | √ |
-| PaddleX | 图像分类 | [ResNet101](https://github.com/PaddlePaddle/PaddleX/blob/develop/paddlex/configs/modules/image_classification/ResNet101.yaml) | √ | √ |
-| PaddleX | 图像分类 | [ResNet152](https://github.com/PaddlePaddle/PaddleX/blob/develop/paddlex/configs/modules/image_classification/ResNet152.yaml) | √ | √ |
-| PaddleX | 图像分类 | [PP-LCNet_x0_25](https://github.com/PaddlePaddle/PaddleX/blob/develop/paddlex/configs/modules/image_classification/PP-LCNet_x0_25.yaml) | √ | √ |
-| PaddleX | 图像分类 | [PP-LCNet_x0_35](https://github.com/PaddlePaddle/PaddleX/blob/develop/paddlex/configs/modules/image_classification/PP-LCNet_x0_35.yaml) | √ | √ |
-| PaddleX | 图像分类 | [PP-LCNet_x0_5](https://github.com/PaddlePaddle/PaddleX/blob/develop/paddlex/configs/modules/image_classification/PP-LCNet_x0_5.yaml) | √ | √ |
-| PaddleX | 图像分类 | [PP-LCNet_x0_75](https://github.com/PaddlePaddle/PaddleX/blob/develop/paddlex/configs/modules/image_classification/PP-LCNet_x0_75.yaml) | √ | √ |
-| PaddleX | 图像分类 | [PP-LCNet_x1_0](https://github.com/PaddlePaddle/PaddleX/blob/develop/paddlex/configs/modules/image_classification/PP-LCNet_x1_0.yaml) | √ | √ |
-| PaddleX | 图像分类 | [PP-LCNet_x1_5](https://github.com/PaddlePaddle/PaddleX/blob/develop/paddlex/configs/modules/image_classification/PP-LCNet_x1_5.yaml) | √ | √ |
-| PaddleX | 图像分类 | [PP-LCNet_x2_0](https://github.com/PaddlePaddle/PaddleX/blob/develop/paddlex/configs/modules/image_classification/PP-LCNet_x2_0.yaml) | √ | √ |
-| PaddleX | 图像分类 | [PP-LCNet_x2_5](https://github.com/PaddlePaddle/PaddleX/blob/develop/paddlex/configs/modules/image_classification/PP-LCNet_x2_5.yaml) | √ | √ |
-| PaddleX | 图像分类 | [MobileNetV3_small_x0_35](https://github.com/PaddlePaddle/PaddleX/blob/develop/paddlex/configs/modules/image_classification/MobileNetV3_small_x0_35.yaml) | √ | √ |
-| PaddleX | 图像分类 | [MobileNetV3_small_x0_5](https://github.com/PaddlePaddle/PaddleX/blob/develop/paddlex/configs/modules/image_classification/MobileNetV3_small_x0_5.yaml) | √ | √ |
-| PaddleX | 图像分类 | [MobileNetV3_small_x0_75](https://github.com/PaddlePaddle/PaddleX/blob/develop/paddlex/configs/modules/image_classification/MobileNetV3_small_x0_75.yaml) | √ | √ |
-| PaddleX | 图像分类 | [MobileNetV3_small_x1_0](https://github.com/PaddlePaddle/PaddleX/blob/develop/paddlex/configs/modules/image_classification/MobileNetV3_small_x1_0.yaml) | √ | √ |
-| PaddleX | 图像分类 | [MobileNetV3_small_x1_25](https://github.com/PaddlePaddle/PaddleX/blob/develop/paddlex/configs/modules/image_classification/MobileNetV3_small_x1_25.yaml) | √ | √ |
-| PaddleX | 图像分类 | [MobileNetV3_large_x0_35](https://github.com/PaddlePaddle/PaddleX/blob/develop/paddlex/configs/modules/image_classification/MobileNetV3_large_x0_35.yaml) | √ | √ |
-| PaddleX | 图像分类 | [MobileNetV3_large_x0_5](https://github.com/PaddlePaddle/PaddleX/blob/develop/paddlex/configs/modules/image_classification/MobileNetV3_large_x0_5.yaml) | √ | √ |
-| PaddleX | 图像分类 | [MobileNetV3_large_x0_75](https://github.com/PaddlePaddle/PaddleX/blob/develop/paddlex/configs/modules/image_classification/MobileNetV3_large_x0_75.yaml) | √ | √ |
-| PaddleX | 图像分类 | [MobileNetV3_large_x1_0](https://github.com/PaddlePaddle/PaddleX/blob/develop/paddlex/configs/modules/image_classification/MobileNetV3_large_x1_0.yaml) | √ | √ |
-| PaddleX | 图像分类 | [MobileNetV3_large_x1_25](https://github.com/PaddlePaddle/PaddleX/blob/develop/paddlex/configs/modules/image_classification/MobileNetV3_large_x1_25.yaml) | √ | √ |
-| PaddleX | 图像分类 | [PP-HGNet_small](https://github.com/PaddlePaddle/PaddleX/blob/develop/paddlex/configs/modules/image_classification/PP-HGNet_small.yaml) | √ | √ |
-| PaddleX | 目标检测 | [PP-YOLOE_plus-S](https://github.com/PaddlePaddle/PaddleX/blob/develop/paddlex/configs/modules/object_detection/PP-YOLOE_plus-S.yaml) | √ | √ |
-| PaddleX | 目标检测 | [PP-YOLOE_plus-M](https://github.com/PaddlePaddle/PaddleX/blob/develop/paddlex/configs/modules/object_detection/PP-YOLOE_plus-M.yaml) | √ | √ |
-| PaddleX | 目标检测 | [PP-YOLOE_plus-L](https://github.com/PaddlePaddle/PaddleX/blob/develop/paddlex/configs/modules/object_detection/PP-YOLOE_plus-L.yaml) | √ | √ |
-| PaddleX | 目标检测 | [PP-YOLOE_plus-X](https://github.com/PaddlePaddle/PaddleX/blob/develop/paddlex/configs/modules/object_detection/PP-YOLOE_plus-X.yaml) | √ | √ |
-| PaddleX | 目标检测 | [PicoDet-S](https://github.com/PaddlePaddle/PaddleX/blob/develop/paddlex/configs/modules/object_detection/PicoDet-S.yaml) | √ | √ |
-| PaddleX | 目标检测 | [PicoDet-L](https://github.com/PaddlePaddle/PaddleX/blob/develop/paddlex/configs/modules/object_detection/PicoDet-L.yaml) | √ | √ |
-| PaddleX | 语义分割 | [PP-LiteSeg-T](https://github.com/PaddlePaddle/PaddleX/blob/develop/paddlex/configs/modules/semantic_segmentation/PP-LiteSeg-T.yaml) | √ | √ |
-| PaddleX | 文本检测 | [PP-OCRv4_server_det](https://github.com/PaddlePaddle/PaddleX/blob/develop/paddlex/configs/modules/text_detection/PP-OCRv4_server_det.yaml) | √ | √ |
-| PaddleX | 文本检测 | [PP-OCRv4_mobile_det](https://github.com/PaddlePaddle/PaddleX/blob/develop/paddlex/configs/modules/text_detection/PP-OCRv4_mobile_det.yaml) | √ | √ |
-| PaddleX | 文本识别 | [PP-OCRv4_server_rec](https://github.com/PaddlePaddle/PaddleX/blob/develop/paddlex/configs/modules/text_recognition/PP-OCRv4_server_rec.yaml) | √ | √ |
-| PaddleX | 文本识别 | [PP-OCRv4_mobile_rec](https://github.com/PaddlePaddle/PaddleX/blob/develop/paddlex/configs/modules/text_recognition/PP-OCRv4_mobile_rec.yaml) | √ | √ |
-| PaddleX | 版面分析 | [PicoDet_layout_1x](https://github.com/PaddlePaddle/PaddleX/blob/develop/paddlex/configs/modules/layout_detection/PicoDet_layout_1x.yaml) | √ | √ |
-| PaddleX | 图像异常检测 | [STFPM](https://github.com/PaddlePaddle/PaddleX/blob/develop/paddlex/configs/modules/anomaly_detection/STFPM.yaml) | √ | √ |
-| PaddleX | 人脸检测 | [PicoDet_LCNet_x2_5_face](https://github.com/PaddlePaddle/PaddleX/blob/release/3.0-beta2/paddlex/configs/modules/face_detection/PicoDet_LCNet_x2_5_face.yaml) | √ | √ |
-| PaddleX | 时序预测 | [DLinear](https://github.com/PaddlePaddle/PaddleX/blob/develop/paddlex/configs/modules/ts_forecast/DLinear.yaml) | √ | √ |
-| PaddleX | 时序预测 | [RLinear](https://github.com/PaddlePaddle/PaddleX/blob/develop/paddlex/configs/modules/ts_forecast/RLinear.yaml) | √ | √ |
-| PaddleX | 时序预测 | [NLinear](https://github.com/PaddlePaddle/PaddleX/blob/develop/paddlex/configs/modules/ts_forecast/NLinear.yaml) | √ | √ |
-| PaddleNLP | 自然语言理解模型 | [BERT](https://github.com/PaddlePaddle/PaddleNLP/tree/develop/slm/model_zoo/bert) | √ | √ |
-| PaddleNLP | 自然语言理解模型 | [ERINE3.0](https://github.com/PaddlePaddle/PaddleNLP/blob/develop/slm/model_zoo/ernie-3.0/configs/modules/default.yml) | √ | √ |
-| PaddleNLP | 大语言模型 | [LLaMA](https://github.com/PaddlePaddle/PaddleNLP/tree/develop/llm/devices/xpu/llama) | √ | √ |
diff --git a/docs/hardware_support/xpu/xpu-p800_install_cn.md b/docs/hardware_support/xpu/xpu-p800_install_cn.md
index 181393abf3b..acf5a6a3917 100644
--- a/docs/hardware_support/xpu/xpu-p800_install_cn.md
+++ b/docs/hardware_support/xpu/xpu-p800_install_cn.md
@@ -68,16 +68,15 @@ xpu-smi
```bash
# 下载并安装 wheel 包
-python -m pip install --pre paddlepaddle-xpu -i https://www.paddlepaddle.org.cn/packages/nightly/xpu-p800/
+python -m pip install paddlepaddle-xpu==3.1.0 -i https://www.paddlepaddle.org.cn/packages/stable/xpu-p800/
```
-⚠️ 注意:nightly 版本为每日构建,可能存在不稳定性。如果需要更稳定的版本,建议使用 3.0-rc 版本。
### 安装方式二:源代码编译安装
在启动的 docker 容器中,下载 Paddle 源码并编译,CMAKE 编译选项含义请参见[编译选项表](https://www.paddlepaddle.org.cn/documentation/docs/zh/develop/install/Tables.html#Compile)。
```bash
# 下载 Paddle 源码
-git clone https://github.com/PaddlePaddle/Paddle.git -b develop
+git clone https://github.com/PaddlePaddle/Paddle.git -b release/3.1
cd Paddle
# 创建编译目录
@@ -91,9 +90,8 @@ cmake .. -DPY_VERSION=3.10 -DCMAKE_BUILD_TYPE=Release -DWITH_GPU=OFF -DWITH_XPU=
make -j50 TARGET=HASWELL
# 编译产出在 build/python/dist/ 路径下,使用 pip 安装即可
-pip install -U paddlepaddle_xpu-0.0.0-cp310-cp310-linux_x86_64.whl
+pip install -U paddlepaddle_xpu-*-linux_x86_64.whl
```
-⚠️ 注意:nightly 版本为每日构建,可能存在不稳定性。如果需要更稳定的版本,建议使用 3.0-rc 版本。
## 基础功能检查
安装完成后,在 docker 容器中输入如下命令进行飞桨基础健康功能的检查。
diff --git a/docs/install/Tables.md b/docs/install/Tables.md
index fd0f93c656e..a9e46d410ff 100644
--- a/docs/install/Tables.md
+++ b/docs/install/Tables.md
@@ -290,11 +290,11 @@ PaddePaddle 通过编译时指定路径来实现引用各种 BLAS/CUDA/cuDNN 库
- | paddlepaddle==[版本号] 例如 paddlepaddle==3.0.0 |
+ paddlepaddle==[版本号] 例如 paddlepaddle==3.1.1 |
只支持 CPU 对应版本的 PaddlePaddle,具体版本请参见Pypi |
- | paddlepaddle-gpu==[版本号] 例如 paddlepaddle-gpu==3.0.0 |
+ paddlepaddle-gpu==[版本号] 例如 paddlepaddle-gpu==3.1.1 |
默认安装支持 CUDA 11.8 和 cuDNN 8 的对应[版本号]的 PaddlePaddle 安装包 |
@@ -303,7 +303,7 @@ PaddePaddle 通过编译时指定路径来实现引用各种 BLAS/CUDA/cuDNN 库
您可以在 [Release History](https://pypi.org/project/paddlepaddle-gpu/#history) 中找到 PaddlePaddle-gpu 的各个发行版本。
-需要注意的是,命令中 paddlepaddle-gpu==3.0.0 在 windows 环境下,会默认安装支持 CUDA 11.8 和 cuDNN 8 的对应[版本号]的 PaddlePaddle 安装包
+需要注意的是,命令中 paddlepaddle-gpu==3.1.1 在 windows 环境下,会默认安装支持 CUDA 11.8 和 cuDNN 8 的对应[版本号]的 PaddlePaddle 安装包
@@ -325,86 +325,86 @@ PaddePaddle 通过编译时指定路径来实现引用各种 BLAS/CUDA/cuDNN 库
| cpu-mkl-avx |
- paddlepaddle-3.0.0-cp38-cp38-linux_x86_64.whl |
- paddlepaddle-3.0.0-cp39-cp39-linux_x86_64.whl |
- paddlepaddle-3.0.0-cp310-cp310-linux_x86_64.whl |
- paddlepaddle-3.0.0-cp311-cp311-linux_x86_64.whl |
- paddlepaddle-3.0.0-cp312-cp312-linux_x86_64.whl |
+ paddlepaddle-3.1.1-cp38-cp38-linux_x86_64.whl |
+ paddlepaddle-3.1.1-cp39-cp39-linux_x86_64.whl |
+ paddlepaddle-3.1.1-cp310-cp310-linux_x86_64.whl |
+ paddlepaddle-3.1.1-cp311-cp311-linux_x86_64.whl |
+ paddlepaddle-3.1.1-cp312-cp312-linux_x86_64.whl |
| cuda11.8-cudnn8.6-mkl-gcc8.2-avx |
-
- paddlepaddle_gpu-3.0.0-cp38-cp38-linux_x86_64.whl |
-
- paddlepaddle_gpu-3.0.0-cp39-cp39-linux_x86_64.whl |
-
- paddlepaddle_gpu-3.0.0-cp310-cp310-linux_x86_64.whl |
-
- paddlepaddle_gpu-3.0.0-cp311-cp311-linux_x86_64.whl |
-
- paddlepaddle_gpu-3.0.0-cp312-cp312-linux_x86_64.whl |
+
+ paddlepaddle_gpu-3.1.1-cp38-cp38-linux_x86_64.whl |
+
+ paddlepaddle_gpu-3.1.1-cp39-cp39-linux_x86_64.whl |
+
+ paddlepaddle_gpu-3.1.1-cp310-cp310-linux_x86_64.whl |
+
+ paddlepaddle_gpu-3.1.1-cp311-cp311-linux_x86_64.whl |
+
+ paddlepaddle_gpu-3.1.1-cp312-cp312-linux_x86_64.whl |
cuda12.6-cudnn9.0-mkl-gcc12.2-avx |
-
- paddlepaddle_gpu-3.0.0-cp38-cp38-linux_x86_64.whl |
-
- paddlepaddle_gpu-3.0.0-cp39-cp39-linux_x86_64.whl |
-
- paddlepaddle_gpu-3.0.0-cp310-cp310-linux_x86_64.whl |
-
- paddlepaddle_gpu-3.0.0-cp311-cp311-linux_x86_64.whl |
-
- paddlepaddle_gpu-3.0.0-cp312-cp312-linux_x86_64.whl |
+
+ paddlepaddle_gpu-3.1.1-cp38-cp38-linux_x86_64.whl |
+
+ paddlepaddle_gpu-3.1.1-cp39-cp39-linux_x86_64.whl |
+
+ paddlepaddle_gpu-3.1.1-cp310-cp310-linux_x86_64.whl |
+
+ paddlepaddle_gpu-3.1.1-cp311-cp311-linux_x86_64.whl |
+
+ paddlepaddle_gpu-3.1.1-cp312-cp312-linux_x86_64.whl |
| macos-cpu-x86 |
-
- paddlepaddle-3.0.0-cp38-cp38-macosx_10_9_x86_64.whl |
-
- paddlepaddle-3.0.0-cp39-cp39-macosx_10_9_x86_64.whl |
-
- paddlepaddle-3.0.0-cp310-cp310-macosx_10_9_universal2.whl |
-
- paddlepaddle-3.0.0-cp311-cp311-macosx_10_9_universal2.whl |
-
- paddlepaddle-3.0.0-cp312-cp312-macosx_10_9_universal2.whl |
+
+ paddlepaddle-3.1.1-cp38-cp38-macosx_10_9_x86_64.whl |
+
+ paddlepaddle-3.1.1-cp39-cp39-macosx_10_9_x86_64.whl |
+
+ paddlepaddle-3.1.1-cp310-cp310-macosx_10_9_universal2.whl |
+
+ paddlepaddle-3.1.1-cp311-cp311-macosx_10_9_universal2.whl |
+
+ paddlepaddle-3.1.1-cp312-cp312-macosx_10_9_universal2.whl |
| macos-cpu-arm |
-
- paddlepaddle-3.0.0-cp38-cp38-macosx_11_0_arm64.whl |
-
- paddlepaddle-3.0.0-cp39-cp39-macosx_11_0_arm64.whl |
-
- paddlepaddle-3.0.0-cp310-cp310-macosx_11_0_arm64.whl |
-
- paddlepaddle-3.0.0-cp311-cp311-macosx_11_0_arm64.whl |
-
- paddlepaddle-3.0.0-cp312-cp312-macosx_11_0_arm64.whl |
+
+ paddlepaddle-3.1.1-cp38-cp38-macosx_11_0_arm64.whl |
+
+ paddlepaddle-3.1.1-cp39-cp39-macosx_11_0_arm64.whl |
+
+ paddlepaddle-3.1.1-cp310-cp310-macosx_11_0_arm64.whl |
+
+ paddlepaddle-3.1.1-cp311-cp311-macosx_11_0_arm64.whl |
+
+ paddlepaddle-3.1.1-cp312-cp312-macosx_11_0_arm64.whl |
| win-cpu-mkl-avx |
- paddlepaddle-3.0.0-cp38-cp38-win_amd64.whl |
- paddlepaddle-3.0.0-cp39-cp39-win_amd64.whl |
- paddlepaddle-3.0.0-cp310-cp310-win_amd64.whl |
- paddlepaddle-3.0.0-cp311-cp311-win_amd64.whl |
- paddlepaddle-3.0.0-cp312-cp312-win_amd64.whl |
+ paddlepaddle-3.1.1-cp38-cp38-win_amd64.whl |
+ paddlepaddle-3.1.1-cp39-cp39-win_amd64.whl |
+ paddlepaddle-3.1.1-cp310-cp310-win_amd64.whl |
+ paddlepaddle-3.1.1-cp311-cp311-win_amd64.whl |
+ paddlepaddle-3.1.1-cp312-cp312-win_amd64.whl |
| win-cuda11.8-cudnn8.6-mkl-vs2019-avx |
- paddlepaddle_gpu-3.0.0-cp38-cp38-win_amd64.whl |
- paddlepaddle_gpu-3.0.0-cp39-cp39-win_amd64.whl |
- paddlepaddle_gpu-3.0.0-cp310-cp310-win_amd64.whl |
- paddlepaddle_gpu-3.0.0-cp311-cp311-win_amd64.whl |
- paddlepaddle_gpu-3.0.0-cp312-cp312-win_amd64.whl |
+ paddlepaddle_gpu-3.1.1-cp38-cp38-win_amd64.whl |
+ paddlepaddle_gpu-3.1.1-cp39-cp39-win_amd64.whl |
+ paddlepaddle_gpu-3.1.1-cp310-cp310-win_amd64.whl |
+ paddlepaddle_gpu-3.1.1-cp311-cp311-win_amd64.whl |
+ paddlepaddle_gpu-3.1.1-cp312-cp312-win_amd64.whl |
| win-cuda12.6-cudnn9.0-mkl-vs2019-avx |
- paddlepaddle_gpu-3.0.0-cp38-cp38-win_amd64.whl |
- paddlepaddle_gpu-3.0.0-cp39-cp39-win_amd64.whl |
- paddlepaddle_gpu-3.0.0-cp310-cp310-win_amd64.whl |
- paddlepaddle_gpu-3.0.0-cp311-cp311-win_amd64.whl |
- paddlepaddle_gpu-3.0.0-cp312-cp312-win_amd64.whl |
+ paddlepaddle_gpu-3.1.1-cp38-cp38-win_amd64.whl |
+ paddlepaddle_gpu-3.1.1-cp39-cp39-win_amd64.whl |
+ paddlepaddle_gpu-3.1.1-cp310-cp310-win_amd64.whl |
+ paddlepaddle_gpu-3.1.1-cp311-cp311-win_amd64.whl |
+ paddlepaddle_gpu-3.1.1-cp312-cp312-win_amd64.whl |
diff --git a/docs/install/Tables_en.md b/docs/install/Tables_en.md
index c208d37633d..08a949c2e84 100644
--- a/docs/install/Tables_en.md
+++ b/docs/install/Tables_en.md
@@ -282,11 +282,11 @@ PaddePaddle implements references to various BLAS/CUDA/cuDNN libraries by specif
- | paddlepaddle==[version code] such as paddlepaddle==3.0.0 |
+ paddlepaddle==[version code] such as paddlepaddle==3.1.1 |
Only support the corresponding version of the CPU PaddlePaddle, please refer to Pypi for the specific version. |
- | paddlepaddle-gpu==[version code], such as paddlepaddle-gpu==3.0.0 |
+ paddlepaddle-gpu==[version code], such as paddlepaddle-gpu==3.1.1 |
The default installation supports the PaddlePaddle installation package corresponding to [version number] of CUDA 11.2 and cuDNN 8 |
@@ -295,7 +295,7 @@ PaddePaddle implements references to various BLAS/CUDA/cuDNN libraries by specif
You can find various distributions of PaddlePaddle-gpu in [the Release History](https://pypi.org/project/paddlepaddle-gpu/#history).
-Please note that: in the commands, paddlepaddle-gpu==3.0.0 will install the installation package of PaddlePaddle that supports CUDA 11.2 and cuDNN 8 by default under Windows environment.
+Please note that: in the commands, paddlepaddle-gpu==3.1.1 will install the installation package of PaddlePaddle that supports CUDA 11.2 and cuDNN 8 by default under Windows environment.
@@ -319,86 +319,86 @@ Please note that: in the commands, paddlepaddle-gpu==3.0.0 will i
| cpu-mkl-avx |
- paddlepaddle-3.0.0-cp38-cp38-linux_x86_64.whl |
- paddlepaddle-3.0.0-cp39-cp39-linux_x86_64.whl |
- paddlepaddle-3.0.0-cp310-cp310-linux_x86_64.whl |
- paddlepaddle-3.0.0-cp311-cp311-linux_x86_64.whl |
- paddlepaddle-3.0.0-cp312-cp312-linux_x86_64.whl |
+ paddlepaddle-3.1.1-cp38-cp38-linux_x86_64.whl |
+ paddlepaddle-3.1.1-cp39-cp39-linux_x86_64.whl |
+ paddlepaddle-3.1.1-cp310-cp310-linux_x86_64.whl |
+ paddlepaddle-3.1.1-cp311-cp311-linux_x86_64.whl |
+ paddlepaddle-3.1.1-cp312-cp312-linux_x86_64.whl |
| cuda11.8-cudnn8.6-mkl-gcc8.2-avx |
-
- paddlepaddle_gpu-3.0.0-cp38-cp38-linux_x86_64.whl |
-
- paddlepaddle_gpu-3.0.0-cp39-cp39-linux_x86_64.whl |
-
- paddlepaddle_gpu-3.0.0-cp310-cp310-linux_x86_64.whl |
-
- paddlepaddle_gpu-3.0.0-cp311-cp311-linux_x86_64.whl |
-
- paddlepaddle_gpu-3.0.0-cp312-cp312-linux_x86_64.whl |
+
+ paddlepaddle_gpu-3.1.1-cp38-cp38-linux_x86_64.whl |
+
+ paddlepaddle_gpu-3.1.1-cp39-cp39-linux_x86_64.whl |
+
+ paddlepaddle_gpu-3.1.1-cp310-cp310-linux_x86_64.whl |
+
+ paddlepaddle_gpu-3.1.1-cp311-cp311-linux_x86_64.whl |
+
+ paddlepaddle_gpu-3.1.1-cp312-cp312-linux_x86_64.whl |
cuda12.6-cudnn9.0-mkl-gcc12.2-avx |
-
- paddlepaddle_gpu-3.0.0-cp38-cp38-linux_x86_64.whl |
-
- paddlepaddle_gpu-3.0.0-cp39-cp39-linux_x86_64.whl |
-
- paddlepaddle_gpu-3.0.0-cp310-cp310-linux_x86_64.whl |
-
- paddlepaddle_gpu-3.0.0-cp311-cp311-linux_x86_64.whl |
-
- paddlepaddle_gpu-3.0.0-cp312-cp312-linux_x86_64.whl |
+
+ paddlepaddle_gpu-3.1.1-cp38-cp38-linux_x86_64.whl |
+
+ paddlepaddle_gpu-3.1.1-cp39-cp39-linux_x86_64.whl |
+
+ paddlepaddle_gpu-3.1.1-cp310-cp310-linux_x86_64.whl |
+
+ paddlepaddle_gpu-3.1.1-cp311-cp311-linux_x86_64.whl |
+
+ paddlepaddle_gpu-3.1.1-cp312-cp312-linux_x86_64.whl |
| macos-cpu-x86 |
-
- paddlepaddle-3.0.0-cp38-cp38-macosx_10_9_x86_64.whl |
-
- paddlepaddle-3.0.0-cp39-cp39-macosx_10_9_x86_64.whl |
-
- paddlepaddle-3.0.0-cp310-cp310-macosx_10_9_universal2.whl |
-
- paddlepaddle-3.0.0-cp311-cp311-macosx_10_9_universal2.whl |
-
- paddlepaddle-3.0.0-cp312-cp312-macosx_10_9_universal2.whl |
+
+ paddlepaddle-3.1.1-cp38-cp38-macosx_10_9_x86_64.whl |
+
+ paddlepaddle-3.1.1-cp39-cp39-macosx_10_9_x86_64.whl |
+
+ paddlepaddle-3.1.1-cp310-cp310-macosx_10_9_universal2.whl |
+
+ paddlepaddle-3.1.1-cp311-cp311-macosx_10_9_universal2.whl |
+
+ paddlepaddle-3.1.1-cp312-cp312-macosx_10_9_universal2.whl |
| macos-cpu-arm |
-
- paddlepaddle-3.0.0-cp38-cp38-macosx_11_0_arm64.whl |
-
- paddlepaddle-3.0.0-cp39-cp39-macosx_11_0_arm64.whl |
-
- paddlepaddle-3.0.0-cp310-cp310-macosx_11_0_arm64.whl |
-
- paddlepaddle-3.0.0-cp311-cp311-macosx_11_0_arm64.whl |
-
- paddlepaddle-3.0.0-cp312-cp312-macosx_11_0_arm64.whl |
+
+ paddlepaddle-3.1.1-cp38-cp38-macosx_11_0_arm64.whl |
+
+ paddlepaddle-3.1.1-cp39-cp39-macosx_11_0_arm64.whl |
+
+ paddlepaddle-3.1.1-cp310-cp310-macosx_11_0_arm64.whl |
+
+ paddlepaddle-3.1.1-cp311-cp311-macosx_11_0_arm64.whl |
+
+ paddlepaddle-3.1.1-cp312-cp312-macosx_11_0_arm64.whl |
| win-cpu-mkl-avx |
- paddlepaddle-3.0.0-cp38-cp38-win_amd64.whl |
- paddlepaddle-3.0.0-cp39-cp39-win_amd64.whl |
- paddlepaddle-3.0.0-cp310-cp310-win_amd64.whl |
- paddlepaddle-3.0.0-cp311-cp311-win_amd64.whl |
- paddlepaddle-3.0.0-cp312-cp312-win_amd64.whl |
+ paddlepaddle-3.1.1-cp38-cp38-win_amd64.whl |
+ paddlepaddle-3.1.1-cp39-cp39-win_amd64.whl |
+ paddlepaddle-3.1.1-cp310-cp310-win_amd64.whl |
+ paddlepaddle-3.1.1-cp311-cp311-win_amd64.whl |
+ paddlepaddle-3.1.1-cp312-cp312-win_amd64.whl |
| win-cuda11.8-cudnn8.6-mkl-vs2019-avx |
- paddlepaddle_gpu-3.0.0-cp38-cp38-win_amd64.whl |
- paddlepaddle_gpu-3.0.0-cp39-cp39-win_amd64.whl |
- paddlepaddle_gpu-3.0.0-cp310-cp310-win_amd64.whl |
- paddlepaddle_gpu-3.0.0-cp311-cp311-win_amd64.whl |
- paddlepaddle_gpu-3.0.0-cp312-cp312-win_amd64.whl |
+ paddlepaddle_gpu-3.1.1-cp38-cp38-win_amd64.whl |
+ paddlepaddle_gpu-3.1.1-cp39-cp39-win_amd64.whl |
+ paddlepaddle_gpu-3.1.1-cp310-cp310-win_amd64.whl |
+ paddlepaddle_gpu-3.1.1-cp311-cp311-win_amd64.whl |
+ paddlepaddle_gpu-3.1.1-cp312-cp312-win_amd64.whl |
| win-cuda12.6-cudnn9.0-mkl-vs2019-avx |
- paddlepaddle_gpu-3.0.0-cp38-cp38-win_amd64.whl |
- paddlepaddle_gpu-3.0.0-cp39-cp39-win_amd64.whl |
- paddlepaddle_gpu-3.0.0-cp310-cp310-win_amd64.whl |
- paddlepaddle_gpu-3.0.0-cp311-cp311-win_amd64.whl |
- paddlepaddle_gpu-3.0.0-cp312-cp312-win_amd64.whl |
+ paddlepaddle_gpu-3.1.1-cp38-cp38-win_amd64.whl |
+ paddlepaddle_gpu-3.1.1-cp39-cp39-win_amd64.whl |
+ paddlepaddle_gpu-3.1.1-cp310-cp310-win_amd64.whl |
+ paddlepaddle_gpu-3.1.1-cp311-cp311-win_amd64.whl |
+ paddlepaddle_gpu-3.1.1-cp312-cp312-win_amd64.whl |
diff --git a/docs/install/compile/linux-compile-by-make.md b/docs/install/compile/linux-compile-by-make.md
index 4ace763038d..32beea661a9 100644
--- a/docs/install/compile/linux-compile-by-make.md
+++ b/docs/install/compile/linux-compile-by-make.md
@@ -3,10 +3,10 @@
## 环境准备
* **Linux 版本 (64 bit)**
- * **CentOS 7 (GPU 版本支持 CUDA 11.0 - 12.0)**
- * **Ubuntu 18.04 (GPU 版本支持 CUDA 11.0 - 12.0)**
- * **Ubuntu 20.04 (GPU 版本支持 CUDA 11.0 - 12.0)**
-* **Python 版本 3.8/3.9/3.10/3.11/3.12 (64 bit)**
+ * **Ubuntu 20.04 (GPU 版本支持 CUDA 11.8 - 12.9)**
+ * **Ubuntu 22.04 (GPU 版本支持 CUDA 11.8 - 12.9)**
+ * **Ubuntu 24.04 (GPU 版本支持 CUDA 11.8 - 12.9)**
+* **Python 版本 3.9/3.10/3.11/3.12/3.13 (64 bit)**
## 选择 CPU/GPU
@@ -35,7 +35,7 @@ Docker 环境中已预装好编译 Paddle 需要的各种依赖,相较本机
使用 Docker 编译 PaddlePaddle,您需要:
-- 在本地主机上[安装 Docker](https://docs.docker.com/engine/install/)
+- 在本地主机上[安装 Docker](https://docs.docker.com/engine/install/),推荐使用[Docker 列表](https://www.paddlepaddle.org.cn/documentation/docs/zh/install/docker/docker_list.html)中的镜像进行编译。
- 如需在 Linux 开启 GPU 支持,请[安装 NVIDIA Container Toolkit
](https://github.com/NVIDIA/nvidia-container-toolkit)
@@ -65,7 +65,7 @@ cd Paddle
* GPU 版的 PaddlePaddle:
```
- docker pull ccr-2vdh3abv-pub.cnc.bj.baidubce.com/paddlepaddle/paddle:latest-dev-cuda12.0-cudnn8.9-trt8.6-gcc12.2
+ docker pull ccr-2vdh3abv-pub.cnc.bj.baidubce.com/paddlepaddle/paddle:cuda126-dev
```
如果您的机器不在中国大陆地区,可以直接从 [DockerHub 中的 paddle 镜像仓库](https://hub.docker.com/r/paddlepaddle/paddle/tags) 拉取镜像:
@@ -77,10 +77,10 @@ cd Paddle
* GPU 版的 PaddlePaddle(**建议使用较新的镜像,并确保已经成功安装 NVIDIA Container Toolkit**):
```
- docker pull paddlepaddle/paddle:latest-dev-cuda12.0-cudnn8.9-trt8.6-gcc12.2
+ docker pull paddlepaddle/paddle:cuda126-dev
```
-上例中,`latest-dev-cuda12.0-cudnn8.9-trt8.6-gcc12.2` 仅作示意用,表示安装 GPU 版的镜像。如果您还想安装其他 cuda/cudnn 版本的镜像,可以将其替换成其他版本(建议拉取最新的 GPU 版本)。
+上例中,`cuda126-dev` 仅作示意用,表示安装 GPU 版的镜像。如果您还想安装其他 cuda/cudnn 版本的镜像,可以将其替换成其他版本(建议拉取最新的 GPU 版本)。
您可以访问[DockerHub](https://hub.docker.com/r/paddlepaddle/paddle/tags/)获取与您机器适配的镜像。
@@ -110,7 +110,7 @@ cd Paddle
用从百度拉取的镜像创建容器
```
- docker run --gpus all --name paddle-test -v $PWD:/paddle --network=host -it ccr-2vdh3abv-pub.cnc.bj.baidubce.com/paddlepaddle/paddle:latest-dev-cuda12.0-cudnn8.9-trt8.6-gcc12.2 /bin/bash
+ docker run --gpus all --name paddle-test -v $PWD:/paddle --network=host -it ccr-2vdh3abv-pub.cnc.bj.baidubce.com/paddlepaddle/paddle:cuda126-dev /bin/bash
```
- `--gpus all`: 在 Docker 容器中允许使用 gpu;
@@ -121,11 +121,11 @@ cd Paddle
- `-it`: 与宿主机保持交互状态;
- - `ccr-2vdh3abv-pub.cnc.bj.baidubce.com/paddlepaddle/paddle:latest-dev-cuda12.0-cudnn8.9-trt8.6-gcc12.2`:使用名为`ccr-2vdh3abv-pub.cnc.bj.baidubce.com/paddlepaddle/paddle:latest-dev-cuda12.0-cudnn8.9-trt8.6-gcc12.2`的镜像创建 Docker 容器,/bin/bash 进入容器后启动/bin/bash 命令。
+ - `ccr-2vdh3abv-pub.cnc.bj.baidubce.com/paddlepaddle/paddle:cuda126-dev`:使用名为`ccr-2vdh3abv-pub.cnc.bj.baidubce.com/paddlepaddle/paddle:cuda126-dev`的镜像创建 Docker 容器,/bin/bash 进入容器后启动/bin/bash 命令。
若使用的是从 DockerHub 拉取的镜像创建容器,则修改镜像名即可:
```
- docker run --gpus all --name paddle-test -v $PWD:/paddle --network=host -it paddlepaddle/paddle:latest-dev-cuda12.0-cudnn8.9-trt8.6-gcc12.2 /bin/bash
+ docker run --gpus all --name paddle-test -v $PWD:/paddle --network=host -it paddlepaddle/paddle:cuda126-dev /bin/bash
```
注意:
@@ -145,7 +145,7 @@ cd /paddle
git checkout develop
```
-paddle 支持 Python 3.8 以上版本
+paddle 支持 Python 3.9 以上版本
#### 7. 创建并进入/paddle/build 路径下:
@@ -160,7 +160,7 @@ mkdir -p /paddle/build && cd /paddle/build
pip3.10 install -r /paddle/python/requirements.txt
```
-注意:以上用 Python3.10 命令来举例,如您的 Python 版本为 3.8/3.9/3.11/3.12,请将上述命令中的 pip3.10 改成 pip3.8/pip3.9/pip3.11/pip3.12
+注意:以上用 Python3.10 命令来举例,如您的 Python 版本为 3.9/3.11/3.12/3.13,请将上述命令中的 pip3.10 改成 pip3.9/pip3.11/pip3.12/pip3.13
#### 9. 执行 cmake:
@@ -171,7 +171,7 @@ pip3.10 install -r /paddle/python/requirements.txt
* 对于需要编译**GPU 版本 PaddlePaddle**的用户:
```
- cmake .. -DPY_VERSION=3.10 -DWITH_GPU=ON
+ cmake .. -DPY_VERSION=3.10 -DWITH_GPU=ON -DWITH_DISTRIBUTE=ON
```
- 具体编译选项含义请参见[编译选项表](https://www.paddlepaddle.org.cn/documentation/docs/zh/develop/install/Tables.html#Compile)
@@ -205,7 +205,7 @@ pip3.10 install -U [whl 包的名字]
```
注意:
-以上用 Python3.10 命令来举例,如您的 Python 版本为 3.8/3.9/3.11/3.12,请将上述命令中的 pip3.10 改成 pip3.8/pip3.9/pip3.11/pip3.12。
+以上用 Python3.10 命令来举例,如您的 Python 版本为 3.9/3.11/3.12/3.13,请将上述命令中的 pip3.10 改成 pip3.9/pip3.11/pip3.12/pip3.13。
#### 恭喜,至此您已完成 PaddlePaddle 的编译安装。您只需要进入 Docker 容器后运行 PaddlePaddle,即可开始使用。更多 Docker 使用请参见[Docker 官方文档](https://docs.docker.com)
@@ -246,17 +246,15 @@ uname -m && cat /etc/*release
#### 3. 安装 NCCL(可选)
-* 如果您需要使用 GPU 多卡,请确保您已经正确安装 nccl2,或者按照以下指令安装 nccl2(这里提供的是 CUDA11.2,cuDNN8 下 nccl2 的安装指令,更多版本的安装信息请参考 NVIDIA[官方网站](https://developer.nvidia.com/nccl)):
+* 如果您需要使用 GPU 多卡,请确保您已经正确安装 nccl2,或者按照以下指令安装 nccl2(这里提供的是 CUDA11.8,cuDNN8 下 nccl2 的安装指令,更多版本的安装信息请参考 NVIDIA[官方网站](https://developer.nvidia.com/nccl)):
```
- rm -f /usr/local/lib/libnccl.so
- wget --no-check-certificate -q https://nccl2-deb.cdn.bcebos.com/libnccl-2.10.3-1+cuda11.4.x86_64.rpm
- wget --no-check-certificate -q https://nccl2-deb.cdn.bcebos.com/libnccl-devel-2.10.3-1+cuda11.4.x86_64.rpm
- wget --no-check-certificate -q https://nccl2-deb.cdn.bcebos.com/libnccl-static-2.10.3-1+cuda11.4.x86_64.rpm
- rpm -ivh libnccl-2.10.3-1+cuda11.4.x86_64.rpm
- rpm -ivh libnccl-devel-2.10.3-1+cuda11.4.x86_64.rpm
- rpm -ivh libnccl-static-2.10.3-1+cuda11.4.x86_64.rpm
+ wget -q https://nccl2-deb.cdn.bcebos.com/nccl_2.16.2-1+cuda11.8_x86_64.txz --no-check-certificate --no-proxy
+ tar xf nccl_2.16.2-1+cuda11.8_x86_64.txz
+ cp -a nccl_2.16.2-1+cuda11.8_x86_64/include/* /usr/include/
+ cp -a nccl_2.16.2-1+cuda11.8_x86_64/lib/* /usr/lib64
+ rm -rf nccl_2.16.2-1+cuda11.8_x86_64 nccl_2.16.2-1+cuda11.8_x86_64.txz
```
@@ -319,13 +317,13 @@ make -j8 && make install
(请参照 Python 官方流程安装)
-* c.(Only For Python3)设置 Python3 相关的环境变量,这里以 python3.10 版本示例,请替换成您使用的版本(3.8、3.9、3.10、3.11、3.12):
+* c.(Only For Python3)设置 Python3 相关的环境变量,这里以 python3.10 版本示例,请替换成您使用的版本(3.9、3.10、3.11、3.12、3.13):
1. 首先使用
```
find `dirname $(dirname $(which python3))` -name "libpython3.so"
```
- 找到 Python lib 的路径,如果是 3.8、3.9、3.10、3.11、3.12,请将`python3`改成`python3.8`、`python3.9`,`python3.10`,`python3.11`,`python3.12`,然后将下面[python-lib-path]替换为找到文件路径
+ 找到 Python lib 的路径,如果是 3.9、3.10、3.11、3.12、3.13,请将`python3`改成`python3.9`,`python3.10`,`python3.11`,`python3.12`,`python3.13`,然后将下面[python-lib-path]替换为找到文件路径
2. 设置 PYTHON_LIBRARIES:
```
@@ -349,7 +347,7 @@ make -j8 && make install
```
(这里将[python-lib-path]的最后两级目录替换为/bin/)
-* d. 安装虚环境`virtualenv`以及`virtualenvwrapper`并创建名为`paddle-venv`的虚环境:(请注意对应 python 版本的 pip3 的命令,如 pip3.8、pip3.9、pip3.10、pip3.11、pip3.12)
+* d. 安装虚环境`virtualenv`以及`virtualenvwrapper`并创建名为`paddle-venv`的虚环境:(请注意对应 python 版本的 pip3 的命令,如 pip3.9、pip3.10、pip3.11、pip3.12、pip3.13)
1. 安装`virtualenv`
```
@@ -432,7 +430,7 @@ mkdir build && cd build
pip3.10 install -r /paddle/python/requirements.txt
```
-注意:以上用 Python3.10 命令来举例,如您的 Python 版本为 3.8/3.9/3.11/3.12,请将上述命令中的 pip3.10 改成 pip3.8/pip3.9/pip3.11/pip3.12
+注意:以上用 Python3.10 命令来举例,如您的 Python 版本为 3.9/3.11/3.12/3.13,请将上述命令中的 pip3.10 改成 pip3.9/pip3.11/pip3.12/pip3.13
#### 12. 执行 cmake:
@@ -450,19 +448,17 @@ pip3.10 install -r /paddle/python/requirements.txt
> 请注意 PY_VERSION 参数更换为您需要的 python 版本
-* 对于需要编译**GPU 版本 PaddlePaddle**的用户:(** CUDA11.0 - CUDA12.0 **)
+* 对于需要编译**GPU 版本 PaddlePaddle**的用户:(** CUDA11.8 - CUDA12.9 **)
- 1. 请确保您已经正确安装 nccl2,或者按照以下指令安装 nccl2(这里提供的是 CUDA11.2,cuDNN8 下 nccl2 的安装指令,更多版本的安装信息请参考 NVIDIA[官方网站](https://developer.nvidia.com/nccl)):
+ 1. 请确保您已经正确安装 nccl2,或者按照以下指令安装 nccl2(这里提供的是 CUDA11.8,cuDNN8 下 nccl2 的安装指令,更多版本的安装信息请参考 NVIDIA[官方网站](https://developer.nvidia.com/nccl)):
```
- rm -f /usr/local/lib/libnccl.so
- wget --no-check-certificate -q https://nccl2-deb.cdn.bcebos.com/libnccl-2.10.3-1+cuda11.4.x86_64.rpm
- wget --no-check-certificate -q https://nccl2-deb.cdn.bcebos.com/libnccl-devel-2.10.3-1+cuda11.4.x86_64.rpm
- wget --no-check-certificate -q https://nccl2-deb.cdn.bcebos.com/libnccl-static-2.10.3-1+cuda11.4.x86_64.rpm
- rpm -ivh libnccl-2.10.3-1+cuda11.4.x86_64.rpm
- rpm -ivh libnccl-devel-2.10.3-1+cuda11.4.x86_64.rpm
- rpm -ivh libnccl-static-2.10.3-1+cuda11.4.x86_64.rpm
+ wget -q https://nccl2-deb.cdn.bcebos.com/nccl_2.16.2-1+cuda11.8_x86_64.txz --no-check-certificate --no-proxy
+ tar xf nccl_2.16.2-1+cuda11.8_x86_64.txz
+ cp -a nccl_2.16.2-1+cuda11.8_x86_64/include/* /usr/include/
+ cp -a nccl_2.16.2-1+cuda11.8_x86_64/lib/* /usr/lib64
+ rm -rf nccl_2.16.2-1+cuda11.8_x86_64 nccl_2.16.2-1+cuda11.8_x86_64.txz
```
@@ -471,10 +467,10 @@ pip3.10 install -r /paddle/python/requirements.txt
2. 如果您已经正确安装了`nccl2`,就可以开始 cmake 了:(*For Python3: 请给 PY_VERSION 参数配置正确的 python 版本*)
```
- cmake .. -DPYTHON_EXECUTABLE:FILEPATH=[您可执行的 Python3 的路径] -DPYTHON_INCLUDE_DIR:PATH=[之前的 PYTHON_INCLUDE_DIRS] -DPYTHON_LIBRARY:FILEPATH=[之前的 PYTHON_LIBRARY] -DWITH_GPU=ON
+ cmake .. -DPYTHON_EXECUTABLE:FILEPATH=[您可执行的 Python3 的路径] -DPYTHON_INCLUDE_DIR:PATH=[之前的 PYTHON_INCLUDE_DIRS] -DPYTHON_LIBRARY:FILEPATH=[之前的 PYTHON_LIBRARY] -DWITH_GPU=ON -DWITH_DISTRIBUTE=ON
```
-注意:以上涉及 Python3 的命令,用 Python3.10 来举例,如您的 Python 版本为 3.8/3.9/3.11/3.12,请将上述命令中的 Python3.10 改成 Python3.8/Python3.9/Python3.11/Python3.12
+注意:以上涉及 Python3 的命令,用 Python3.10 来举例,如您的 Python 版本为 3.9/3.11/3.12/3.13,请将上述命令中的 Python3.10 改成 Python3.9/Python3.11/Python3.12/Python3.13
diff --git a/docs/install/compile/linux-compile-by-make_en.md b/docs/install/compile/linux-compile-by-make_en.md
index 25065a4ed2a..4062e7b9348 100644
--- a/docs/install/compile/linux-compile-by-make_en.md
+++ b/docs/install/compile/linux-compile-by-make_en.md
@@ -3,10 +3,10 @@
## Environment preparation
* **Linux version (64 bit)**
- * **CentOS 7 (GPU 版本支持 CUDA 11.0 - 12.0)**
- * **Ubuntu 18.04 (GPU 版本支持 CUDA 11.0 - 12.0)**
- * **Ubuntu 20.04 (GPU 版本支持 CUDA 11.0 - 12.0)**
-* **Python 版本 3.8/3.9/3.10/3.11/3.12 (64 bit)**
+ * **Ubuntu 20.04 (GPU 版本支持 CUDA 11.8 - 12.9)**
+ * **Ubuntu 22.04 (GPU 版本支持 CUDA 11.8 - 12.9)**
+ * **Ubuntu 24.04 (GPU 版本支持 CUDA 11.8 - 12.9)**
+* **Python 版本 3.9/3.10/3.11/3.12/3.13 (64 bit)**
## Choose CPU/GPU
@@ -35,7 +35,7 @@ The dependencies required for compiling Paddle are pre-installed in the Docker e
Compiling PaddlePaddle with Docker,you need:
-- On the local host [Install Docker](https://docs.docker.com/engine/install/)
+- On the local host [Install Docker](https://docs.docker.com/engine/install/),Recommended to use [Docker List]( https://www.paddlepaddle.org.cn/documentation/docs/zh/install/docker/docker_list.html )Compile the image in.
- To enable GPU support on Linux, please [Install NVIDIA Container Toolkit](https://github.com/NVIDIA/nvidia-container-toolkit)
@@ -63,7 +63,7 @@ For domestic users, when downloading docker is slow due to network problems, you
* GPU version of PaddlePaddle:
```
- docker pull ccr-2vdh3abv-pub.cnc.bj.baidubce.com/paddlepaddle/paddle:latest-dev-cuda12.0-cudnn8.9-trt8.6-gcc12.2
+ docker pull ccr-2vdh3abv-pub.cnc.bj.baidubce.com/paddlepaddle/paddle:cuda126-dev
```
If your machine is not in mainland China, you can pull the image directly from DockerHub:
@@ -75,10 +75,10 @@ If your machine is not in mainland China, you can pull the image directly from D
* GPU version of PaddlePaddle:
```
- docker pull paddlepaddle/paddle:latest-dev-cuda12.0-cudnn8.9-trt8.6-gcc12.2
+ docker pull paddlepaddle/paddle:cuda126-dev
```
-In the above example, `latest-dev-cuda11.2-cudnn8.2-trt8.0-gcc82` is only for illustration, indicating that the GPU version of the image is installed. If you want to install another `cuda/cudnn` version of the image, you can replace it with `latest-dev-cuda12.0-cudnn8.9-trt8.6-gcc12.2` etc.
+In the above example, `cuda126-dev` is only for illustration, indicating that the GPU version of the image is installed. If you want to install another `cuda/cudnn` version of the image, you can replace it with `cuda126-dev` etc.
You can see [DockerHub](https://hub.docker.com/r/paddlepaddle/paddle/tags/) to get the image that matches your machine.
@@ -113,7 +113,7 @@ You can see [DockerHub](https://hub.docker.com/r/paddlepaddle/paddle/tags/) to g
Using the image pulled from Baidu.
```
- docker run --gpus all --name paddle-test -v $PWD:/paddle --network=host -it ccr-2vdh3abv-pub.cnc.bj.baidubce.com/paddlepaddle/paddle:latest-dev-cuda12.0-cudnn8.9-trt8.6-gcc12.2 /bin/bash
+ docker run --gpus all --name paddle-test -v $PWD:/paddle --network=host -it ccr-2vdh3abv-pub.cnc.bj.baidubce.com/paddlepaddle/paddle:cuda126-dev /bin/bash
```
- `--gpus all`: gpu resources can be used in Docker container;
@@ -127,11 +127,11 @@ You can see [DockerHub](https://hub.docker.com/r/paddlepaddle/paddle/tags/) to g
- `-it`: keeps interaction with the host;
- - `ccr-2vdh3abv-pub.cnc.bj.baidubce.com/paddlepaddle/paddle:latest-dev-cuda12.0-cudnn8.9-trt8.6-gcc12.2`: use the image named `ccr-2vdh3abv-pub.cnc.bj.baidubce.com/paddlepaddle/paddle:latest-dev-cuda12.0-cudnn8.9-trt8.6-gcc12.2` to create Docker container, /bin/bash start the /bin/bash command after entering the container.
+ - `ccr-2vdh3abv-pub.cnc.bj.baidubce.com/paddlepaddle/paddle:cuda126-dev`: use the image named `ccr-2vdh3abv-pub.cnc.bj.baidubce.com/paddlepaddle/paddle:cuda126-dev` to create Docker container, /bin/bash start the /bin/bash command after entering the container.
If you are using the image pulled from DockerHub, just modify the image name.
```
- docker run --gpus all --name paddle-test -v $PWD:/paddle --network=host -it paddlepaddle/paddle:latest-dev-cuda12.0-cudnn8.9-trt8.6-gcc12.2 /bin/bash
+ docker run --gpus all --name paddle-test -v $PWD:/paddle --network=host -it paddlepaddle/paddle:cuda126-dev /bin/bash
```
Note:
@@ -150,7 +150,7 @@ cd /paddle
git checkout develop
```
-Paddle supports Python version 3.8 and above
+Paddle supports Python version 3.9 and above
#### 7. Create and enter the /paddle/build path:
@@ -166,7 +166,7 @@ mkdir -p /paddle/build && cd /paddle/build
pip3.10 install protobuf
```
-Note: We used Python3.10 command as an example above, if the version of your Python is 3.8/3.9/3.11/3.12, please change pip3.10 in the commands to pip3.8/pip3.9/pip3.11/pip3.12
+Note: We used Python3.10 command as an example above, if the version of your Python is 3.9/3.11/3.12/3.13, please change pip3.10 in the commands to pip3.9/pip3.11/pip3.12/pip3.13
- Installing patchelf, PatchELF is a small and useful program for modifying the dynamic linker and RPATH of ELF executables.
@@ -188,7 +188,7 @@ pip3.10 install -r /paddle/python/requirements.txt
* For users who need to compile the **GPU version PaddlePaddle**:
```
- cmake .. -DPY_VERSION=3.10 -DWITH_GPU=ON
+ cmake .. -DPY_VERSION=3.10 -DWITH_GPU=ON -DWITH_DISTRIBUTE=ON
```
- For details on the compilation options, see the [compilation options table](https://www.paddlepaddle.org.cn/documentation/docs/en/develop/install/Tables.html#Compile).
@@ -221,7 +221,7 @@ pip3.10 install -U [whl package name]
```
Note:
-We used Python3.10 command as an example above, if the version of your Python is 3.8/3.9/3.11/3.12, please change pip3.10 in the commands to pip3.8/pip3.9/pip3.11/3.12.
+We used Python3.10 command as an example above, if the version of your Python is 3.9/3.11/3.12/3.13, please change pip3.10 in the commands to pip3.9/pip3.11/pip3.12/pip3.13.
#### Congratulations, now that you have successfully installed PaddlePaddle using Docker, you only need to run PaddlePaddle after entering the Docker container. For more Docker usage, please refer to the [official Docker documentation](https://docs.docker.com/).
@@ -252,19 +252,17 @@ uname -m && cat /etc/*release
#### 3. Install NCCL (optional)
-* If you need to use multi card environment, please make sure that you have installed nccl2 correctly, or install nccl2 according to the following instructions (here is the installation instructions of nccl2 under CUDA11.2 and cuDNN8. For more version of installation information, please refer to NVIDIA[official website](https://developer.nvidia.com/nccl)):
+* If you need to use multi card environment, please make sure that you have installed nccl2 correctly, or install nccl2 according to the following instructions (here is the installation instructions of nccl2 under CUDA11.8 and cuDNN8. For more version of installation information, please refer to NVIDIA[official website](https://developer.nvidia.com/nccl)):
* **CentOS system can refer to the following commands**
```
- rm -f /usr/local/lib/libnccl.so
- wget --no-check-certificate -q https://nccl2-deb.cdn.bcebos.com/libnccl-2.10.3-1+cuda11.4.x86_64.rpm
- wget --no-check-certificate -q https://nccl2-deb.cdn.bcebos.com/libnccl-devel-2.10.3-1+cuda11.4.x86_64.rpm
- wget --no-check-certificate -q https://nccl2-deb.cdn.bcebos.com/libnccl-static-2.10.3-1+cuda11.4.x86_64.rpm
- rpm -ivh libnccl-2.10.3-1+cuda11.4.x86_64.rpm
- rpm -ivh libnccl-devel-2.10.3-1+cuda11.4.x86_64.rpm
- rpm -ivh libnccl-static-2.10.3-1+cuda11.4.x86_64.rpm
+ wget -q https://nccl2-deb.cdn.bcebos.com/nccl_2.16.2-1+cuda11.8_x86_64.txz --no-check-certificate --no-proxy
+ tar xf nccl_2.16.2-1+cuda11.8_x86_64.txz
+ cp -a nccl_2.16.2-1+cuda11.8_x86_64/include/* /usr/include/
+ cp -a nccl_2.16.2-1+cuda11.8_x86_64/lib/* /usr/lib64
+ rm -rf nccl_2.16.2-1+cuda11.8_x86_64 nccl_2.16.2-1+cuda11.8_x86_64.txz
```
@@ -325,15 +323,15 @@ make -j8 && make install
* b. Install pip:
- (Please refer to the official Python installation process, and ensure that the pip3 version 20.2.2 and above, please note that in python3.8 and above, pip3 does not necessarily correspond to the python version, such as python3.10 default only Pip3.10)
+ (Please refer to the official Python installation process, and ensure that the pip3 version 20.2.2 and above, please note that in python3.9 and above, pip3 does not necessarily correspond to the python version, such as python3.10 default only Pip3.10)
-* c. (Only For Python3) set Python3 related environment variables, here is python3.10 version example, please replace with the version you use (3.8, 3.9, 3.11, 3.12):
+* c. (Only For Python3) set Python3 related environment variables, here is python3.10 version example, please replace with the version you use (3.9, 3.11, 3.12, 3.13):
1. First find the path to the Python lib using
```
find `dirname $(dirname $(which python3))` -name "libpython3.so"
```
- If it is 3.8/3.9/3.10/3.11/3.12, change `python3` to `python3.8`, `python3.9`, `python3.10`, `python3.11`, `python3.12`, then replace [python-lib-path] in the following steps with the file path found.
+ If it is 3.9/3.10/3.11/3.12/3.13, change `python3` to `python3.9`, `python3.10`, `python3.11`, `python3.12`, `python3.13`, then replace [python-lib-path] in the following steps with the file path found.
2. Set PYTHON_LIBRARIES:
```
@@ -357,7 +355,7 @@ make -j8 && make install
```
(here replace the last two levels content of [python-lib-path] with /bin/)
-* d. Install the virtual environment `virtualenv` and `virtualenvwrapper` and create a virtual environment called `paddle-venv`: (please note the pip3 commands corresponding to the python version, such as pip3.8, pip3.9, pip3.10, pip3.11, pip3.12)
+* d. Install the virtual environment `virtualenv` and `virtualenvwrapper` and create a virtual environment called `paddle-venv`: (please note the pip3 commands corresponding to the python version, such as pip3.9, pip3.10, pip3.11, pip3.12, pip3.13)
1. Install `virtualenv`:
```
@@ -425,7 +423,7 @@ git clone https://github.com/PaddlePaddle/Paddle.git
cd Paddle
```
-#### 9. Switch to develop branch for compilation (Paddle supports Python version 3.8 and above):
+#### 9. Switch to develop branch for compilation (Paddle supports Python version 3.9 and above):
```
git checkout develop
@@ -455,16 +453,14 @@ mkdir build && cd build
* For users who need to compile the **GPU version PaddlePaddle**:
- 1. Please make sure that you have installed nccl2 correctly, or install nccl2 according to the following instructions (here is ubuntu 20.04, CUDA11.2, cuDNN8 nccl2 installation instructions, for more information on the installation information please refer to the [NVIDIA official website](https://developer.nvidia.com/nccl/nccl-download)):
+ 1. Please make sure that you have installed nccl2 correctly, or install nccl2 according to the following instructions (here is ubuntu 20.04, CUDA11.8, cuDNN8 nccl2 installation instructions, for more information on the installation information please refer to the [NVIDIA official website](https://developer.nvidia.com/nccl/nccl-download)):
```
- rm -f /usr/local/lib/libnccl.so
- wget --no-check-certificate -q https://nccl2-deb.cdn.bcebos.com/libnccl-2.10.3-1+cuda11.4.x86_64.rpm
- wget --no-check-certificate -q https://nccl2-deb.cdn.bcebos.com/libnccl-devel-2.10.3-1+cuda11.4.x86_64.rpm
- wget --no-check-certificate -q https://nccl2-deb.cdn.bcebos.com/libnccl-static-2.10.3-1+cuda11.4.x86_64.rpm
- rpm -ivh libnccl-2.10.3-1+cuda11.4.x86_64.rpm
- rpm -ivh libnccl-devel-2.10.3-1+cuda11.4.x86_64.rpm
- rpm -ivh libnccl-static-2.10.3-1+cuda11.4.x86_64.rpm
+ wget -q https://nccl2-deb.cdn.bcebos.com/nccl_2.16.2-1+cuda11.8_x86_64.txz --no-check-certificate --no-proxy
+ tar xf nccl_2.16.2-1+cuda11.8_x86_64.txz
+ cp -a nccl_2.16.2-1+cuda11.8_x86_64/include/* /usr/include/
+ cp -a nccl_2.16.2-1+cuda11.8_x86_64/lib/* /usr/lib64
+ rm -rf nccl_2.16.2-1+cuda11.8_x86_64 nccl_2.16.2-1+cuda11.8_x86_64.txz
```
@@ -472,11 +468,11 @@ mkdir build && cd build
```
- cmake .. -DPYTHON_EXECUTABLE:FILEPATH=[您可执行的 Python3 的路径] -DPYTHON_INCLUDE_DIR:PATH=[之前的 PYTHON_INCLUDE_DIRS] -DPYTHON_LIBRARY:FILEPATH=[之前的 PYTHON_LIBRARY] -DWITH_GPU=ON
+ cmake .. -DPYTHON_EXECUTABLE:FILEPATH=[您可执行的 Python3 的路径] -DPYTHON_INCLUDE_DIR:PATH=[之前的 PYTHON_INCLUDE_DIRS] -DPYTHON_LIBRARY:FILEPATH=[之前的 PYTHON_LIBRARY] -DWITH_GPU=ON -DWITH_DISTRIBUTE=ON
```
-Note: For the command involving Python 3, we use Python 3.10 as an example above, if the version of your Python is 3.8/3.9/3.11/3.12, please change Python3.10 in the commands to Python3.8/Python3.9/Python3.11/Python3.12
+Note: For the command involving Python 3, we use Python 3.10 as an example above, if the version of your Python is 3.9/3.11/3.12/3.13, please change Python3.10 in the commands to Python3.9/Python3.11/Python3.12/Python3.13
diff --git a/docs/install/compile/linux-compile-by-ninja.md b/docs/install/compile/linux-compile-by-ninja.md
index 9a5cdc36ea7..2720e294a45 100644
--- a/docs/install/compile/linux-compile-by-ninja.md
+++ b/docs/install/compile/linux-compile-by-ninja.md
@@ -3,10 +3,10 @@
## 环境准备
* **Linux 版本 (64 bit)**
- * **CentOS 7 (GPU 版本支持 CUDA 11.0 - 12.0)**
- * **Ubuntu 18.04 (GPU 版本支持 CUDA 11.0 - 12.0)**
- * **Ubuntu 20.04 (GPU 版本支持 CUDA 11.0 - 12.0)**
-* **Python 版本 3.8/3.9/3.10/3.11/3.12 (64 bit)**
+ * **Ubuntu 20.04 (GPU 版本支持 CUDA 11.8 - 12.9)**
+ * **Ubuntu 22.04 (GPU 版本支持 CUDA 11.8 - 12.9)**
+ * **Ubuntu 24.04 (GPU 版本支持 CUDA 11.8 - 12.9)**
+* **Python 版本 3.9/3.10/3.11/3.12/3.13 (64 bit)**
## 选择 CPU/GPU
@@ -35,7 +35,7 @@ Docker 环境中已预装好编译 Paddle 需要的各种依赖,相较本机
使用 Docker 编译 PaddlePaddle,您需要:
-- 在本地主机上[安装 Docker](https://docs.docker.com/engine/install/)
+- 在本地主机上[安装 Docker](https://docs.docker.com/engine/install/),推荐使用[Docker 列表](https://www.paddlepaddle.org.cn/documentation/docs/zh/install/docker/docker_list.html)中的镜像进行编译。
- 如需在 Linux 开启 GPU 支持,请[安装 NVIDIA Container Toolkit
](https://github.com/NVIDIA/nvidia-container-toolkit)
@@ -65,7 +65,7 @@ cd Paddle
* GPU 版的 PaddlePaddle:
```
- docker pull ccr-2vdh3abv-pub.cnc.bj.baidubce.com/paddlepaddle/paddle:latest-dev-cuda12.0-cudnn8.9-trt8.6-gcc12.2
+ docker pull ccr-2vdh3abv-pub.cnc.bj.baidubce.com/paddlepaddle/paddle:cuda126-dev
```
如果您的机器不在中国大陆地区,可以直接从 [DockerHub 中的 paddle 镜像仓库](https://hub.docker.com/r/paddlepaddle/paddle/tags) 拉取镜像:
@@ -77,10 +77,10 @@ cd Paddle
* GPU 版的 PaddlePaddle(**建议使用较新的镜像,并确保已经成功安装 NVIDIA Container Toolkit**):
```
- docker pull paddlepaddle/paddle:latest-dev-cuda12.0-cudnn8.9-trt8.6-gcc12.2
+ docker pull paddlepaddle/paddle:cuda126-dev
```
-上例中,`latest-dev-cuda12.0-cudnn8.9-trt8.6-gcc12.2` 仅作示意用,表示安装 GPU 版的镜像。如果您还想安装其他 cuda/cudnn 版本的镜像,可以将其替换成其他版本(建议拉取最新的 GPU 版本)。
+上例中,`cuda126-dev` 仅作示意用,表示安装 GPU 版的镜像。如果您还想安装其他 cuda/cudnn 版本的镜像,可以将其替换成其他版本(建议拉取最新的 GPU 版本)。
您可以访问[DockerHub](https://hub.docker.com/r/paddlepaddle/paddle/tags/)获取与您机器适配的镜像。
@@ -110,7 +110,7 @@ cd Paddle
用从百度拉取的镜像创建容器
```
- docker run --gpus all --name paddle-test -v $PWD:/paddle --network=host -it ccr-2vdh3abv-pub.cnc.bj.baidubce.com/paddlepaddle/paddle:latest-dev-cuda12.0-cudnn8.9-trt8.6-gcc12.2 /bin/bash
+ docker run --gpus all --name paddle-test -v $PWD:/paddle --network=host -it ccr-2vdh3abv-pub.cnc.bj.baidubce.com/paddlepaddle/paddle:cuda126-dev /bin/bash
```
- `--gpus all`: 在 Docker 容器中允许使用 gpu;
@@ -121,11 +121,11 @@ cd Paddle
- `-it`: 与宿主机保持交互状态;
- - `ccr-2vdh3abv-pub.cnc.bj.baidubce.com/paddlepaddle/paddle:latest-dev-cuda12.0-cudnn8.9-trt8.6-gcc12.2`:使用名为`ccr-2vdh3abv-pub.cnc.bj.baidubce.com/paddlepaddle/paddle`, tag 为`latest-dev-cuda12.0-cudnn8.9-trt8.6-gcc12.2`的镜像创建 Docker 容器,/bin/bash 进入容器后启动/bin/bash 命令。
+ - `ccr-2vdh3abv-pub.cnc.bj.baidubce.com/paddlepaddle/paddle:cuda126-dev`:使用名为`ccr-2vdh3abv-pub.cnc.bj.baidubce.com/paddlepaddle/paddle`, tag 为`cuda126-dev`的镜像创建 Docker 容器,/bin/bash 进入容器后启动/bin/bash 命令。
若使用的是从 DockerHub 拉取的镜像创建容器,则修改镜像名即可:
```
- docker run --gpus all --name paddle-test -v $PWD:/paddle --network=host -it paddlepaddle/paddle:latest-dev-cuda12.0-cudnn8.9-trt8.6-gcc12.2 /bin/bash
+ docker run --gpus all --name paddle-test -v $PWD:/paddle --network=host -it paddlepaddle/paddle:cuda126-dev /bin/bash
```
注意:
@@ -141,7 +141,7 @@ cd /paddle
```
git checkout develop
```
-paddle 支持 Python 3.8 以上版本
+paddle 支持 Python 3.9 以上版本
#### 7. 创建并进入/paddle/build 路径下:
```
mkdir -p /paddle/build && cd /paddle/build
@@ -151,7 +151,7 @@ mkdir -p /paddle/build && cd /paddle/build
```
pip3.10 install -r /paddle/python/requirements.txt
```
-注意:以上用 Python3.10 命令来举例,如您的 Python 版本为 3.8/3.9/3.11/3.12,请将上述命令中的 pip3.10 改成 pip3.8/pip3.9/pip3.11/pip3.12
+注意:以上用 Python3.10 命令来举例,如您的 Python 版本为 3.9/3.11/3.12/3.13,请将上述命令中的 pip3.10 改成 pip3.9/pip3.11/pip3.12/pip3.13
#### 9. 执行 cmake:
* 对于需要编译**CPU 版本 PaddlePaddle**的用户:
```
@@ -159,7 +159,7 @@ pip3.10 install -r /paddle/python/requirements.txt
```
* 对于需要编译**GPU 版本 PaddlePaddle**的用户:
```
- cmake .. -GNinja -DPY_VERSION=3.10 -DWITH_GPU=ON
+ cmake .. -GNinja -DPY_VERSION=3.10 -DWITH_GPU=ON -DWITH_DISTRIBUTE=ON
```
- 具体编译选项含义请参见[编译选项表](https://www.paddlepaddle.org.cn/documentation/docs/zh/develop/install/Tables.html#Compile)
- 请注意修改参数`-DPY_VERSION`为您希望编译使用的 python 版本, 例如`-DPY_VERSION=3.10`表示 python 版本为 3.10
@@ -179,7 +179,7 @@ For Python3:
```
pip3.10 install -U [whl 包的名字]
```
-注意:以上用 Python3.10 命令来举例,如您的 Python 版本为 3.8/3.9/3.11/3.12,请将上述命令中的 pip3.10 改成 pip3.8/pip3.9/pip3.11/pip3.12。
+注意:以上用 Python3.10 命令来举例,如您的 Python 版本为 3.9/3.11/3.12/3.13,请将上述命令中的 pip3.10 改成 pip3.9/pip3.11/pip3.12/pip3.13。
#### 恭喜,至此您已完成 PaddlePaddle 的编译安装。您只需要进入 Docker 容器后运行 PaddlePaddle,即可开始使用。更多 Docker 使用请参见[Docker 官方文档](https://docs.docker.com)
### 本机编译
@@ -203,15 +203,13 @@ uname -m && cat /etc/*release
apt update
```
#### 3. 安装 NCCL(可选)
-* 如果您需要使用 GPU 多卡,请确保您已经正确安装 nccl2,或者按照以下指令安装 nccl2(这里提供的是 CUDA11.2,cuDNN8 下 nccl2 的安装指令,更多版本的安装信息请参考 NVIDIA[官方网站](https://developer.nvidia.com/nccl)):
+* 如果您需要使用 GPU 多卡,请确保您已经正确安装 nccl2,或者按照以下指令安装 nccl2(这里提供的是 CUDA11.8,cuDNN8 下 nccl2 的安装指令,更多版本的安装信息请参考 NVIDIA[官方网站](https://developer.nvidia.com/nccl)):
```
- rm -f /usr/local/lib/libnccl.so
- wget --no-check-certificate -q https://nccl2-deb.cdn.bcebos.com/libnccl-2.10.3-1+cuda11.4.x86_64.rpm
- wget --no-check-certificate -q https://nccl2-deb.cdn.bcebos.com/libnccl-devel-2.10.3-1+cuda11.4.x86_64.rpm
- wget --no-check-certificate -q https://nccl2-deb.cdn.bcebos.com/libnccl-static-2.10.3-1+cuda11.4.x86_64.rpm
- rpm -ivh libnccl-2.10.3-1+cuda11.4.x86_64.rpm
- rpm -ivh libnccl-devel-2.10.3-1+cuda11.4.x86_64.rpm
- rpm -ivh libnccl-static-2.10.3-1+cuda11.4.x86_64.rpm
+ wget -q https://nccl2-deb.cdn.bcebos.com/nccl_2.16.2-1+cuda11.8_x86_64.txz --no-check-certificate --no-proxy
+ tar xf nccl_2.16.2-1+cuda11.8_x86_64.txz
+ cp -a nccl_2.16.2-1+cuda11.8_x86_64/include/* /usr/include/
+ cp -a nccl_2.16.2-1+cuda11.8_x86_64/lib/* /usr/lib64
+ rm -rf nccl_2.16.2-1+cuda11.8_x86_64 nccl_2.16.2-1+cuda11.8_x86_64.txz
```
#### 4. 安装必要的工具
`bzip2`以及`make`:
@@ -255,12 +253,12 @@ make -j8 && make install
(请参照 Python 官方流程安装)
* b. 安装 pip:
(请参照 Python 官方流程安装)
-* c.(Only For Python3)设置 Python3 相关的环境变量,这里以 python3.10 版本示例,请替换成您使用的版本(3.8、3.9、3.10、3.11、3.12):
+* c.(Only For Python3)设置 Python3 相关的环境变量,这里以 python3.10 版本示例,请替换成您使用的版本(3.9、3.10、3.11、3.12、3.13):
1. 首先使用
```
find `dirname $(dirname $(which python3))` -name "libpython3.so"
```
- 找到 Python lib 的路径,如果是 3.8、3.9、3.10、3.11、3.12,请将`python3`改成`python3.8`、`python3.9`,`python3.10`, `python3.11`,`python3.12`然后将下面[python-lib-path]替换为找到文件路径
+ 找到 Python lib 的路径,如果是 3.9、3.10、3.11、3.12、3.13,请将`python3`改成`python3.9`,`python3.10`, `python3.11`,`python3.12`,`python3.13`然后将下面[python-lib-path]替换为找到文件路径
2. 设置 PYTHON_LIBRARIES:
```
export PYTHON_LIBRARY=[python-lib-path]
@@ -279,7 +277,7 @@ make -j8 && make install
export PATH=[python-lib-path]:$PATH
```
(这里将[python-lib-path]的最后两级目录替换为/bin/)
-* d. 安装虚环境`virtualenv`以及`virtualenvwrapper`并创建名为`paddle-venv`的虚环境:(请注意对应 python 版本的 pip3 的命令,如 pip3.8、pip3.9、pip3.10、pip3.11、pip3.12))
+* d. 安装虚环境`virtualenv`以及`virtualenvwrapper`并创建名为`paddle-venv`的虚环境:(请注意对应 python 版本的 pip3 的命令,如 pip3.9、pip3.10、pip3.11、pip3.12、pip3.13))
1. 安装`virtualenv`
```
pip install virtualenv
@@ -351,7 +349,7 @@ mkdir build && cd build
pip3.10 install -r /paddle/python/requirements.txt
```
-注意:以上用 Python3.10 命令来举例,如您的 Python 版本为 3.8/3.9/3.11/3.12,请将上述命令中的 pip3.10 改成 pip3.8/pip3.9/pip3.11/pip3.12
+注意:以上用 Python3.10 命令来举例,如您的 Python 版本为 3.9/3.11/3.12/3.13,请将上述命令中的 pip3.10 改成 pip3.9/pip3.11/pip3.12/pip3.13
#### 12. 执行 cmake:
>具体编译选项含义请参见[编译选项表](https://www.paddlepaddle.org.cn/documentation/docs/zh/develop/install/Tables.html#Compile)
* 对于需要编译**CPU 版本 PaddlePaddle**的用户:
@@ -361,22 +359,20 @@ pip3.10 install -r /paddle/python/requirements.txt
```
> 如果遇到`Could NOT find PROTOBUF (missing: PROTOBUF_LIBRARY PROTOBUF_INCLUDE_DIR)`可以重新执行一次 cmake 指令。
> 请注意 PY_VERSION 参数更换为您需要的 python 版本
-* 对于需要编译**GPU 版本 PaddlePaddle**的用户:(**仅支持 CentOS7(** CUDA11.0 - CUDA12.0 **)**)
- 1. 请确保您已经正确安装 nccl2,或者按照以下指令安装 nccl2(这里提供的是 CUDA11.2,cuDNN8 下 nccl2 的安装指令,更多版本的安装信息请参考 NVIDIA[官方网站](https://developer.nvidia.com/nccl)):
+* 对于需要编译**GPU 版本 PaddlePaddle**的用户:(**仅支持 CentOS7(** CUDA11.8 - CUDA12.9 **)**)
+ 1. 请确保您已经正确安装 nccl2,或者按照以下指令安装 nccl2(这里提供的是 CUDA11.8,cuDNN8 下 nccl2 的安装指令,更多版本的安装信息请参考 NVIDIA[官方网站](https://developer.nvidia.com/nccl)):
```
- rm -f /usr/local/lib/libnccl.so
- wget --no-check-certificate -q https://nccl2-deb.cdn.bcebos.com/libnccl-2.10.3-1+cuda11.4.x86_64.rpm
- wget --no-check-certificate -q https://nccl2-deb.cdn.bcebos.com/libnccl-devel-2.10.3-1+cuda11.4.x86_64.rpm
- wget --no-check-certificate -q https://nccl2-deb.cdn.bcebos.com/libnccl-static-2.10.3-1+cuda11.4.x86_64.rpm
- rpm -ivh libnccl-2.10.3-1+cuda11.4.x86_64.rpm
- rpm -ivh libnccl-devel-2.10.3-1+cuda11.4.x86_64.rpm
- rpm -ivh libnccl-static-2.10.3-1+cuda11.4.x86_64.rpm
+ wget -q https://nccl2-deb.cdn.bcebos.com/nccl_2.16.2-1+cuda11.8_x86_64.txz --no-check-certificate --no-proxy
+ tar xf nccl_2.16.2-1+cuda11.8_x86_64.txz
+ cp -a nccl_2.16.2-1+cuda11.8_x86_64/include/* /usr/include/
+ cp -a nccl_2.16.2-1+cuda11.8_x86_64/lib/* /usr/lib64
+ rm -rf nccl_2.16.2-1+cuda11.8_x86_64 nccl_2.16.2-1+cuda11.8_x86_64.txz
```
2. 如果您已经正确安装了`nccl2`,就可以开始 cmake 了:(*For Python3: 请给 PY_VERSION 参数配置正确的 python 版本*)
```
- cmake .. -GNinja -DPYTHON_EXECUTABLE:FILEPATH=[您可执行的 Python3 的路径] -DPYTHON_INCLUDE_DIR:PATH=[之前的 PYTHON_INCLUDE_DIRS] -DPYTHON_LIBRARY:FILEPATH=[之前的 PYTHON_LIBRARY] -DWITH_GPU=ON
+ cmake .. -GNinja -DPYTHON_EXECUTABLE:FILEPATH=[您可执行的 Python3 的路径] -DPYTHON_INCLUDE_DIR:PATH=[之前的 PYTHON_INCLUDE_DIRS] -DPYTHON_LIBRARY:FILEPATH=[之前的 PYTHON_LIBRARY] -DWITH_GPU=ON -DWITH_DISTRIBUTE=ON
```
-注意:以上涉及 Python3 的命令,用 Python3.10 来举例,如您的 Python 版本为 3.8/3.9/3.11/3.12,请将上述命令中的 Python3.10 改成 Python3.8/Python3.9/Python3.11/Python3.12
+注意:以上涉及 Python3 的命令,用 Python3.10 来举例,如您的 Python 版本为 3.9/3.11/3.12/3.13,请将上述命令中的 Python3.10 改成 Python3.9/Python3.11/Python3.12/Python3.13
#### 13. 使用以下命令来编译:
```
ninja -j$(nproc)
diff --git a/docs/install/compile/macos-compile-make.md b/docs/install/compile/macos-compile-make.md
index 449aa755508..b91aa0528f7 100644
--- a/docs/install/compile/macos-compile-make.md
+++ b/docs/install/compile/macos-compile-make.md
@@ -2,8 +2,8 @@
## 环境准备
-* **macOS 版本 10.x/11.x/12.x/13.x/14.x (64 bit) (不支持 GPU 版本)**
-* **Python 版本 3.8/3.9/3.10/3.11/3.12 (64 bit)**
+* **macOS 版本 10.x/11.x/12.x/13.x/14.x/15.x (64 bit) (不支持 GPU 版本)**
+* **Python 版本 3.9/3.10/3.11/3.12/3.13 (64 bit)**
## 选择 CPU/GPU
@@ -89,7 +89,7 @@ cd /paddle
git checkout develop
```
-paddle 支持 Python 3.8 以上版本
+paddle 支持 Python 3.9 以上版本
#### 8. 创建并进入/paddle/build 路径下:
@@ -104,7 +104,7 @@ mkdir -p /paddle/build && cd /paddle/build
pip3.10 install -r /paddle/python/requirements.txt
```
-注意:以上用 Python3.10 命令来举例,如您的 Python 版本为 3.8/3.9/3.11/3.12,请将上述命令中的 pip3.10 改成 pip3.8/pip3.9/pip3.11/pip3.12
+注意:以上用 Python3.10 命令来举例,如您的 Python 版本为 3.9/3.11/3.12/3.13,请将上述命令中的 pip3.10 改成 pip3.9/pip3.11/pip3.12/pip3.13
#### 10. 执行 cmake:
@@ -138,7 +138,7 @@ cd /paddle/build/python/dist
pip3.10 install -U [whl 包的名字]
```
-注意:以上用 Python3.10 命令来举例,如您的 Python 版本为 3.8/3.9/3.11/3.12,请将上述命令中的 pip3.10 改成 pip3.8/pip3.9/pip3.11/pip3.12。
+注意:以上用 Python3.10 命令来举例,如您的 Python 版本为 3.9/3.11/3.12/3.13,请将上述命令中的 pip3.10 改成 pip3.9/pip3.11/pip3.12/pip3.13。
#### 恭喜,至此您已完成 PaddlePaddle 的编译安装。您只需要进入 Docker 容器后运行 PaddlePaddle,即可开始使用。更多 Docker 使用请参见[Docker 官方文档](https://docs.docker.com)
@@ -157,7 +157,7 @@ uname -m
#### 2. 安装 Python 以及 pip:
-> **请不要使用 macOS 中自带 Python**,我们强烈建议您使用[Homebrew](https://brew.sh)安装 python(对于**Python3**请使用 python[官方下载](https://www.python.org/downloads/mac-osx/)python3.8、python3.9、python3.10、python3.11、python3.12), pip 以及其他的依赖,这将会使您高效编译。
+> **请不要使用 macOS 中自带 Python**,我们强烈建议您使用[Homebrew](https://brew.sh)安装 python(对于**Python3**请使用 python[官方下载](https://www.python.org/downloads/mac-osx/)python3.9、python3.10、python3.11、python3.12、python3.13), pip 以及其他的依赖,这将会使您高效编译。
使用 Python 官网安装
@@ -233,7 +233,7 @@ cd Paddle
git checkout develop
```
-paddle 支持 Python 3.8 以上版本
+paddle 支持 Python 3.9 以上版本
#### 7. 并且请创建并进入一个叫 build 的目录下:
@@ -248,7 +248,7 @@ mkdir build && cd build
pip3.10 install -r /paddle/python/requirements.txt
```
-注意:以上用 Python3.10 命令来举例,如您的 Python 版本为 3.8/3.9/3.11/3.12,请将上述命令中的 pip3.10 改成 pip3.8/pip3.9/pip3.11/pip3.12
+注意:以上用 Python3.10 命令来举例,如您的 Python 版本为 3.9/3.11/3.12/3.13,请将上述命令中的 pip3.10 改成 pip3.9/pip3.11/pip3.12/pip3.13
#### 9. 执行 cmake:
diff --git a/docs/install/compile/macos-compile-make_en.md b/docs/install/compile/macos-compile-make_en.md
index f970f68b14b..bfd2ef464e5 100644
--- a/docs/install/compile/macos-compile-make_en.md
+++ b/docs/install/compile/macos-compile-make_en.md
@@ -2,8 +2,8 @@
## Environment preparation
-* **macOS version 10.x/11.x/12.x/13.x/14.x (64 bit) (not support GPU version)**
-* **Python version 3.8/3.9/3.10/3.11/3.12 (64 bit)**
+* **macOS version 10.x/11.x/12.x/13.x/14.x/15.x (64 bit) (not support GPU version)**
+* **Python version 3.9/3.10/3.11/3.12/3.13 (64 bit)**
## Choose CPU/GPU
@@ -93,7 +93,7 @@ cd /paddle
git checkout develop
```
-Paddle supports Python version 3.8 and above
+Paddle supports Python version 3.9 and above
#### 8. Create and enter the /paddle/build path:
@@ -109,7 +109,7 @@ mkdir -p /paddle/build && cd /paddle/build
pip3.10 install protobuf==3.20.2
```
-Note: We used Python3.10 command as an example above, if the version of your Python is 3.8/3.9/3.11/3.12, please change pip3.10 in the commands to pip3.8/pip3.9/3.11/3.12
+Note: We used Python3.10 command as an example above, if the version of your Python is 3.9/3.11/3.12/3.13, please change pip3.10 in the commands to pip3.9/pip3.11/pip3.12/pip3.13
> Installing patchelf, PatchELF is a small and useful program for modifying the dynamic linker and RPATH of ELF executables.
@@ -158,7 +158,7 @@ pip3.10 install -U [whl package name]
```
Note:
-We used Python3.10 command as an example above, if the version of your Python is 3.8/3.9/3.11/3.12, please change pip3.10 in the commands to pip3.8/pip3.9/pip3.11/pip3.12.
+We used Python3.10 command as an example above, if the version of your Python is 3.9/3.11/3.12/3.13, please change pip3.10 in the commands to pip3.9/pip3.11/pip3.12/pip3.13.
#### Congratulations, now that you have successfully installed PaddlePaddle using Docker, you only need to run PaddlePaddle after entering the Docker container. For more Docker usage, please refer to the [official Docker documentation](https://docs.docker.com/).
@@ -173,7 +173,7 @@ We used Python3.10 command as an example above, if the version of your Python is
#### 2. Install python and pip:
-> **Please do not use the Python initially given by macOS**, we strongly recommend that you use [Homebrew](https://brew.sh/) to install python (for Python3 please use python [official download](https://www.python.org/downloads/mac-osx/) python3.8, python3.9, python3.10, python3.11, python3.12), pip and other dependencies, This will greatly reduce the difficulty of installing and compiling.
+> **Please do not use the Python initially given by macOS**, we strongly recommend that you use [Homebrew](https://brew.sh/) to install python (for Python3 please use python [official download](https://www.python.org/downloads/mac-osx/) python3.9, python3.10, python3.11, python3.12, python3.13), pip and other dependencies, This will greatly reduce the difficulty of installing and compiling.
Install using Python official website
@@ -248,7 +248,7 @@ git clone https://github.com/PaddlePaddle/Paddle.git
cd Paddle
```
-#### 6. Switch to develop branch to compile: (Paddle supports Python version 3.8 and above)
+#### 6. Switch to develop branch to compile: (Paddle supports Python version 3.9 and above)
```
git checkout develop
diff --git a/docs/install/compile/macos-compile-ninja.md b/docs/install/compile/macos-compile-ninja.md
index 3cc70387738..3ed75bf8c8d 100644
--- a/docs/install/compile/macos-compile-ninja.md
+++ b/docs/install/compile/macos-compile-ninja.md
@@ -2,8 +2,8 @@
## 环境准备
-* **macOS 版本 10.x/11.x/12.x/13.x/14.x (64 bit) (不支持 GPU 版本)**
-* **Python 版本 3.8/3.9/3.10/3.11/3.12 (64 bit)**
+* **macOS 版本 10.x/11.x/12.x/13.x/14.x/15.x (64 bit) (不支持 GPU 版本)**
+* **Python 版本 3.9/3.10/3.11/3.12/3.13 (64 bit)**
## 选择 CPU/GPU
@@ -73,7 +73,7 @@ cd /paddle
```
git checkout develop
```
-paddle 支持 Python 3.8 以上版本
+paddle 支持 Python 3.9 以上版本
#### 8. 创建并进入/paddle/build 路径下:
```
@@ -84,7 +84,7 @@ mkdir -p /paddle/build && cd /paddle/build
```
pip3.10 install -r /paddle/python/requirements.txt
```
-注意:以上用 Python3.10 命令来举例,如您的 Python 版本为 3.8/3.9/3.11/3.12,请将上述命令中的 pip3.10 改成 pip3.8/pip3.9/pip3.11/pip3.12
+注意:以上用 Python3.10 命令来举例,如您的 Python 版本为 3.9/3.11/3.12/3.13,请将上述命令中的 pip3.10 改成 pip3.9/pip3.11/pip3.12/pip3.13
#### 10. 执行 cmake:
* 对于需要编译**CPU 版本 PaddlePaddle**的用户(我们目前不支持 macOS 下 GPU 版本 PaddlePaddle 的编译):
```
@@ -106,7 +106,7 @@ cd /paddle/build/python/dist
```
pip3.10 install -U [whl 包的名字]
```
-注意:以上用 Python3.10 命令来举例,如您的 Python 版本为 3.8/3.9/3.11/3.12,请将上述命令中的 pip3.10 改成 pip3.8/pip3.9/pip3.11/pip3.12。
+注意:以上用 Python3.10 命令来举例,如您的 Python 版本为 3.9/3.11/3.12/3.13,请将上述命令中的 pip3.10 改成 pip3.9/pip3.11/pip3.12/pip3.13。
#### 恭喜,至此您已完成 PaddlePaddle 的编译安装。您只需要进入 Docker 容器后运行 PaddlePaddle,即可开始使用。更多 Docker 使用请参见[Docker 官方文档](https://docs.docker.com)
@@ -118,7 +118,7 @@ uname -m
```
并且在`关于本机`中查看系统版本。并提前安装[OpenCV](https://opencv.org/releases.html)
#### 2. 安装 Python 以及 pip:
-> **请不要使用 macOS 中自带 Python**,我们强烈建议您使用[Homebrew](https://brew.sh)安装 python(对于**Python3**请使用 python[官方下载](https://www.python.org/downloads/mac-osx/)python3.8、python3.9、python3.10、python3.11、python3.12), pip 以及其他的依赖,这将会使您高效编译。
+> **请不要使用 macOS 中自带 Python**,我们强烈建议您使用[Homebrew](https://brew.sh)安装 python(对于**Python3**请使用 python[官方下载](https://www.python.org/downloads/mac-osx/)python3.9、python3.10、python3.11、python3.12、python3.13), pip 以及其他的依赖,这将会使您高效编译。
使用 Python 官网安装
> 请注意,当您的 mac 上安装有多个 python 时请保证您正在使用的 python 是您希望使用的 python。
#### 3. (Only For Python3)设置 Python 相关的环境变量:
@@ -172,7 +172,7 @@ cd Paddle
```
git checkout develop
```
-paddle 支持 Python 3.8 以上版本
+paddle 支持 Python 3.9 以上版本
#### 7. 并且请创建并进入一个叫 build 的目录下:
```
mkdir build && cd build
@@ -184,7 +184,7 @@ mkdir build && cd build
pip3.10 install -r /paddle/python/requirements.txt
```
-注意:以上用 Python3.10 命令来举例,如您的 Python 版本为 3.8/3.9/3.11/3.12,请将上述命令中的 pip3.10 改成 pip3.8/pip3.9/pip3.11/pip3.12
+注意:以上用 Python3.10 命令来举例,如您的 Python 版本为 3.9/3.11/3.12/3.13,请将上述命令中的 pip3.10 改成 pip3.9/pip3.11/pip3.12/pip3.13
#### 9. 执行 cmake:
>具体编译选项含义请参见[编译选项表](https://www.paddlepaddle.org.cn/documentation/docs/zh/develop/install/Tables.html#Compile)
* 对于需要编译**CPU 版本 PaddlePaddle**的用户:
diff --git a/docs/install/compile/windows-compile.md b/docs/install/compile/windows-compile.md
index db5e13c0886..bc83e30f513 100644
--- a/docs/install/compile/windows-compile.md
+++ b/docs/install/compile/windows-compile.md
@@ -7,7 +7,7 @@
## 环境准备
* **Windows 7/8/10 专业版/企业版 (64bit)**
-* **Python 版本 3.8/3.9/3.10/3.11/3.12 (64 bit)**
+* **Python 版本 3.9/3.10/3.11/3.12/3.13 (64 bit)**
* **Visual Studio 2017/2019 社区版/专业版/企业版**
## 选择 CPU/GPU
@@ -24,7 +24,7 @@
> **git**:官网下载[链接](https://github.com/git-for-windows/git/releases/download/v2.35.1.windows.2/Git-2.35.1.2-64-bit.exe),使用默认选项安装。
- > **python**:官网[链接](https://www.python.org/downloads/windows/),可选择 3.8/3.9/3.10/3.11/3.12 中任一版本的 Windows installer(64-bit)安装。安装时注意勾选 `Add Python 3.x to PATH`,将 Python 添加到环境变量中。
+ > **python**:官网[链接](https://www.python.org/downloads/windows/),可选择 3.9/3.10/3.11/3.12/3.13 中任一版本的 Windows installer(64-bit)安装。安装时注意勾选 `Add Python 3.x to PATH`,将 Python 添加到环境变量中。
> **Visual studio**:VS2017 仅用于 CPU 版编译,建议安装 VS2019。官网[链接](https://visualstudio.microsoft.com/zh-hans/vs/older-downloads/),需要登录后下载,建议下载 Community 社区版。在安装时需要在工作负荷一栏中勾选 `使用 C++的桌面开发` 和 `通用 Windows 平台开发`,并在语言包一栏中选择 `英语`。
@@ -66,7 +66,7 @@
编译 GPU 版本的 Paddle:
```
- cmake .. -GNinja -DWITH_GPU=ON -DWITH_UNITY_BUILD=ON
+ cmake .. -GNinja -DWITH_GPU=ON -DWITH_UNITY_BUILD=ON -DWITH_DISTRIBUTE=ON
```
其他编译选项含义请参见[编译选项表](https://www.paddlepaddle.org.cn/documentation/docs/zh/develop/install/Tables.html#Compile)。
@@ -76,12 +76,12 @@
```
set CUDA_TOOLKIT_ROOT_DIR=C:/Program Files/NVIDIA GPU Computing Toolkit/CUDA/v11.2
set PATH=%CUDA_TOOLKIT_ROOT_DIR:/=\%\bin;%CUDA_TOOLKIT_ROOT_DIR:/=\%\libnvvp;%PATH%
- cmake .. -GNinja -DWITH_GPU=ON -DCUDA_TOOLKIT_ROOT_DIR="%CUDA_TOOLKIT_ROOT_DIR%" -DWITH_UNITY_BUILD=ON
+ cmake .. -GNinja -DWITH_GPU=ON -DCUDA_TOOLKIT_ROOT_DIR="%CUDA_TOOLKIT_ROOT_DIR%" -DWITH_UNITY_BUILD=ON -DWITH_DISTRIBUTE=ON
```
> 2. 如果本机安装了多个 Python,将自动使用最新安装的 Python 版本。若需要指定 Python 版本,则需要指定 Python 路径,例如:
```
cmake .. -GNinja -DWITH_GPU=ON -DPYTHON_EXECUTABLE=C:\Python38\python.exe -DPYTHON_INCLUDE_DIR=C:\Python38\include -DPYTHON_LIBRARY=C:\Python38\libs\python38.lib
- -DWITH_UNITY_BUILD=ON
+ -DWITH_UNITY_BUILD=ON -DWITH_DISTRIBUTE=ON
```
7. 执行编译:
diff --git a/docs/install/compile/windows-compile_en.md b/docs/install/compile/windows-compile_en.md
index 711ae115d3b..c5e7f167c45 100644
--- a/docs/install/compile/windows-compile_en.md
+++ b/docs/install/compile/windows-compile_en.md
@@ -4,7 +4,7 @@
* **Windows 7/8/10 Pro/Enterprise(64bit)**
* **GPU Version support CUDA 11.0 - 12.0, and only support single GPU**
-* **Python version 3.8+/3.9+/3.10+/3.11+/3.12+(64bit)**
+* **Python version 3.9+/3.10+/3.11+/3.12+/3.13+ (64bit)**
* **pip version 20.2.2 or above (64bit)**
* **Visual Studio 2017(for CPU)/2019(for GPU)**
@@ -29,7 +29,7 @@ There is one compilation methods in Windows system:
> CMake requires version 3.17 and above, and add to the ring Environment variables.
- > Python requires version 3.8 and above, which can be downloaded from the [official website](https://www.python.org/downloads/release).
+ > Python requires version 3.9 and above, which can be downloaded from the [official website](https://www.python.org/downloads/release).
* After installing python, please check whether the python version is the expected version by `python-version`, because you may have more than one python installed on your computer. You can handle conflicts of multiple pythons by changing the order of the environment variables.
@@ -54,7 +54,7 @@ There is one compilation methods in Windows system:
git checkout develop
```
- Note: Paddle supports Python version 3.8 and above.
+ Note: Paddle supports Python version 3.9 and above.
4. Create a directory called build and enter it:
@@ -94,11 +94,11 @@ There is one compilation methods in Windows system:
```
set CUDA_TOOLKIT_ROOT_DIR=C:/Program Files/NVIDIA GPU Computing Toolkit/CUDA/v11.2
set PATH=%CUDA_TOOLKIT_ROOT_DIR:/=\%\bin;%CUDA_TOOLKIT_ROOT_DIR:/=\%\libnvvp;%PATH%
- cmake .. -GNinja -DWITH_GPU=ON -DCUDA_TOOLKIT_ROOT_DIR="%CUDA_TOOLKIT_ROOT_DIR%" -DWITH_UNITY_BUILD=ON
+ cmake .. -GNinja -DWITH_GPU=ON -DCUDA_TOOLKIT_ROOT_DIR="%CUDA_TOOLKIT_ROOT_DIR%" -DWITH_UNITY_BUILD=ON -DWITH_DISTRIBUTE=ON
```
> 2. If more than one Python are installed, the latest installed Python will be used by default, and you can choose the Python version by `-DPYTHON_EXECUTABLE` . for example:
```
- cmake .. -GNinja -DWITH_GPU=ON -DPYTHON_EXECUTABLE=C:\\Python38\\python.exe -DWITH_UNITY_BUILD=ON
+ cmake .. -GNinja -DWITH_GPU=ON -DPYTHON_EXECUTABLE=C:\\Python38\\python.exe -DWITH_UNITY_BUILD=ON -DWITH_DISTRIBUTE=ON
```
6. Execute compile:
diff --git a/docs/install/conda/fromconda.rst b/docs/install/conda/fromconda.rst
deleted file mode 100644
index 1a14f0b524f..00000000000
--- a/docs/install/conda/fromconda.rst
+++ /dev/null
@@ -1,10 +0,0 @@
-===========================
-**Conda 安装**
-===========================
-
-.. toctree::
- :maxdepth: 1
-
- linux-conda.md
- macos-conda.md
- windows-conda.md
diff --git a/docs/install/conda/fromconda_en.rst b/docs/install/conda/fromconda_en.rst
deleted file mode 100644
index fb1eb259379..00000000000
--- a/docs/install/conda/fromconda_en.rst
+++ /dev/null
@@ -1,10 +0,0 @@
-==============================
-**Install via conda**
-==============================
-
-.. toctree::
-
-
- linux-conda_en.md
- macos-conda_en.md
- windows-conda_en.md
diff --git a/docs/install/conda/linux-conda.md b/docs/install/conda/linux-conda.md
deleted file mode 100644
index b16bad75a22..00000000000
--- a/docs/install/conda/linux-conda.md
+++ /dev/null
@@ -1,124 +0,0 @@
-# Linux 下的 Conda 安装
-
-[Anaconda](https://www.anaconda.com/)是一个免费开源的 Python 和 R 语言的发行版本,用于计算科学,Anaconda 致力于简化包管理和部署。Anaconda 的包使用软件包管理系统 Conda 进行管理。Conda 是一个开源包管理系统和环境管理系统,可在 Windows、macOS 和 Linux 上运行。本文档为你介绍 Anaconda 安装方式,飞桨提供的 Anaconda 安装包支持分布式训练(多机多卡)、TensorRT 推理功能。
-
-
-## 一、环境准备
-
-### 1.1 创建虚拟环境
-
-#### 1.1.1 安装环境
-
-首先根据具体的 Python 版本创建 Anaconda 虚拟环境,PaddlePaddle 的 Anaconda 安装支持 3.9 - 3.13 版本的 Python 安装环境。
-
-```
-conda create -n paddle_env python=YOUR_PY_VER
-```
-
-
-#### 1.1.2 进入 Anaconda 虚拟环境
-
-```
-conda activate paddle_env
-```
-
-
-
-### 1.2 其他环境检查
-
-#### 1.2.1 确认 Python 安装路径
-
-确认您的 conda 虚拟环境和需要安装 PaddlePaddle 的 Python 是您预期的位置,因为您计算机可能有多个 Python。进入 Anaconda 的命令行终端,输入以下指令确认 Python 位置。
-
-
-输出 Python 路径的命令为:
-
-
-```
-which python3
-```
-
-根据您的环境,您可能需要将说明中所有命令行中的 python3 替换为具体的 Python 路径
-
-
-
-#### 1.2.2 检查 Python 版本
-
-使用以下命令确认版本
-
-```
-python3 --version
-```
-
-
-#### 1.2.3 检查系统环境
-
-确认 Python 和 pip 是 64bit,并且处理器架构是 x86_64(或称作 x64、Intel 64、AMD64)架构。下面的第一行输出的是"64bit",第二行输出的是"x86_64(或 x64、AMD64)"即可:
-
-```
-python3 -c "import platform;print(platform.architecture()[0]);print(platform.machine())"
-```
-
-
-
-## 二、开始安装
-
-本文档为您介绍 conda 安装方式
-
-### 添加清华源(可选)
-
-对于国内用户无法连接到 Anaconda 官方源的可以按照以下命令添加清华源:
-
- ```
- conda config --add channels https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/free/
- ```
- ```
- conda config --add channels https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main/
- ```
- ```
- conda config --set show_channel_urls yes
- ```
-
-### 根据版本进行安装
-
-选择下面您要安装的 PaddlePaddle
-
-
-#### CPU 版的 PaddlePaddle
-
-
-如果您的计算机没有 NVIDIA® GPU,请安装 CPU 版的 PaddlePaddle
-
-```
-conda install paddlepaddle==3.0.0 -c paddle
-```
-
-
-#### GPU 版的 PaddlePaddle
-
-
-* 对于 `CUDA 11.8` 安装命令为:
-
- ```
- conda install paddlepaddle-gpu==3.0.0 paddlepaddle-cuda=11.8 -c paddle -c nvidia
- ```
-
-* 对于 `CUDA 12.6` 安装命令为:
-
- ```
- conda install paddlepaddle-gpu==3.0.0 paddlepaddle-cuda=12.6 -c paddle -c nvidia
- ```
-
-* 对于 `CUDA 12.9` 安装命令为:
-
- ```
- conda install paddlepaddle-gpu==3.0.0 paddlepaddle-cuda=12.9 -c paddle -c nvidia
- ```
-
-
-## **三、验证安装**
-
-安装完成后您可以使用 `python3` 进入 python 解释器,输入`import paddle` ,再输入
- `paddle.utils.run_check()`
-
-如果出现`PaddlePaddle is installed successfully!`,说明您已成功安装。
diff --git a/docs/install/conda/linux-conda_en.md b/docs/install/conda/linux-conda_en.md
deleted file mode 100644
index b93e8d1fce9..00000000000
--- a/docs/install/conda/linux-conda_en.md
+++ /dev/null
@@ -1,128 +0,0 @@
-# Installation on Linux via Conda
-
-[Anaconda](https://www.anaconda.com/)is a free and open source distribution of Python and R for computational science. Anaconda is dedicated to simplifying package management and deployment. Anaconda's packages are managed using the package management system Conda. Conda is an open source package management system and environment management system that runs on Windows, macOS, and Linux.
-
-
-## Environmental preparation
-
-### 1.1 Create Virtual Environment
-
-#### 1.1.1 Create the Anaconda Virtual Environment
-
-Create virtual environment First create the Anaconda virtual environment according to the specific Python version. The Anaconda installation of PaddlePaddle supports Python version of 3.9 - 3.13.
-
-```
-conda create -n paddle_env python=YOUR_PY_VER
-```
-
-
-
-#### 1.1.2 Enter the Anaconda Virtual Environment
-
-```
-conda activate paddle_env
-```
-
-
-
-### 1.2 Confirm Other Environments
-
-Confirm that your conda virtual environment and the Python loaction which is preapared to install PaddlePaddle are where you expected them for your computer may have multiple Pythons environments. Enter Anaconda's command line terminal and enter the following command to confirm the Python location.
-
-#### 1.2.1 Confirm the installation path of python
-
-Depending on your environment, you may need to replace python3 in all command lines in the instructions with specific Python path.
-
-The command to get the Python path is:
-
-```
-which python3
-```
-
-
-
-#### 1.2.2 Check the version of Python
-
-
-Use the following command to confirm it's version
-
-```
-python3 --version
-```
-
-
-
-#### 1.2.3 Check the system environment
-
-Confirm that Python and pip are 64bit, and the processor architecture is x86_64 (or x64, Intel 64, AMD64) architecture. The first line below print "64bit", the second line prints "x86_64 (or x64, AMD64)."
-
-
-```
-python3 -c "import platform;print(platform.architecture()[0]);print(platform.machine())"
-```
-
-
-
-
-
-## INSTALLATION
-
-### Add Tsinghua source (optional)
-
-For domestic users who cannot connect to the Anaconda official source, you can add Tsinghua source according to the following command.
-
-
-```
-conda config --add channels https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/free/
-```
-```
-conda config --add channels https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main/
-```
-```
-conda config --set show_channel_urls yes
-```
-
-
-### Installation Step
-
-You can choose the following version of PaddlePaddle to start installation:
-
-
-
-#### CPU Version of PaddlePaddle
-
-
-If your computer doesn't have NVIDIA® GPU, please install `the CPU Version of PaddlePaddle`
-
-```
-conda install paddlepaddle==3.0.0 -c paddle
-```
-
-
-#### GPU Version of PaddlePaddle
-
-
-* If you are using CUDA 11.8:
-
- ```
- conda install paddlepaddle-gpu==3.0.0 paddlepaddle-cuda=11.8 -c paddle -c nvidia
- ```
-
-* If you are using CUDA 12.6:
-
- ```
- conda install paddlepaddle-gpu==3.0.0 paddlepaddle-cuda=12.6 -c paddle -c nvidia
- ```
-
-* If you are using CUDA 12.9:
-
- ```
- conda install paddlepaddle-gpu==3.0.0 paddlepaddle-cuda=12.9 -c paddle -c nvidia
- ```
-
-
-## Verify installation
-
-After the installation is complete, you can use `python3` to enter the Python interpreter and then use `import paddle` and `paddle.utils.run_check()`
-
-If `PaddlePaddle is installed successfully!` appears, to verify that the installation was successful.
diff --git a/docs/install/conda/macos-conda.md b/docs/install/conda/macos-conda.md
deleted file mode 100644
index 92483697344..00000000000
--- a/docs/install/conda/macos-conda.md
+++ /dev/null
@@ -1,94 +0,0 @@
-# macOS 下的 Conda 安装
-
-[Anaconda](https://www.anaconda.com/)是一个免费开源的 Python 和 R 语言的发行版本,用于计算科学,Anaconda 致力于简化包管理和部署。Anaconda 的包使用软件包管理系统 Conda 进行管理。Conda 是一个开源包管理系统和环境管理系统,可在 Windows、macOS 和 Linux 上运行。
-
-## 一、环境准备
-
-### 1.1 创建虚拟环境
-
-#### 1.1.1 安装环境
-
-首先根据具体的 Python 版本创建 Anaconda 虚拟环境,PaddlePaddle 的 Anaconda 安装支持 3.9 - 3.13 版本的 Python 安装环境。
-
-```
-conda create -n paddle_env python=YOUR_PY_VER
-```
-
-
-#### 1.1.2 进入 Anaconda 虚拟环境
-
-
-```
-conda activate paddle_env
-```
-
-
-
-### 1.2 其他环境检查
-
-#### 1.2.1 确认 Python 安装路径
-
-确认您的 conda 虚拟环境和需要安装 PaddlePaddle 的 Python 是您预期的位置,因为您计算机可能有多个 Python。进入 Anaconda 的命令行终端,输入以下指令确认 Python 位置。
-
-输出 Python 路径的命令为:
-
-```
-which python3
-```
-
-根据您的环境,您可能需要将说明中所有命令行中的 python3 替换为具体的 Python 路径
-
-
-
-#### 1.2.2 检查 Python 版本
-
-使用以下命令确认版本
-
-```
-python3 --version
-```
-
-
-
-#### 1.2.3 检查系统环境
-
-确认 Python 和 pip 是 64bit,并且处理器架构是 arm64 架构(paddle 已原生支持 Mac M 芯片), 不再支持 x86_64 架构
-
-
-```
-python3 -c "import platform;print(platform.architecture()[0]);print(platform.machine())"
-```
-
-
-## 二、开始安装
-
-本文档为您介绍 conda 安装方式
-
-### 添加清华源(可选)
-
-* 对于国内用户无法连接到 Anaconda 官方源的可以按照以下命令添加清华源:
-
- ```
- conda config --add channels https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/free/
- ```
- ```
- conda config --add channels https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main/
- ```
- ```
- conda config --set show_channel_urls yes
- ```
-
-### 安装 CPU 版 PaddlePaddle
-
-* 目前在 macOS 环境仅支持 CPU 版 PaddlePaddle,请参考如下命令安装 Paddle:
-
- ```
- conda install paddlepaddle==3.0.0 -c paddle
- ```
-
-## **三、验证安装**
-
-安装完成后您可以使用 `python3` 进入 python 解释器,输入`import paddle` ,再输入
- `paddle.utils.run_check()`
-
-如果出现`PaddlePaddle is installed successfully!`,说明您已成功安装。
diff --git a/docs/install/conda/macos-conda_en.md b/docs/install/conda/macos-conda_en.md
deleted file mode 100644
index f90bc1fd8c9..00000000000
--- a/docs/install/conda/macos-conda_en.md
+++ /dev/null
@@ -1,98 +0,0 @@
-# Installation on macOS via Conda
-
-[Anaconda](https://www.anaconda.com/)is a free and open source distribution of Python and R for computational science. Anaconda is dedicated to simplifying package management and deployment. Anaconda's packages are managed using the package management system Conda. Conda is an open source package management system and environment management system that runs on Windows, macOS, and Linux.
-
-
-
-## Environmental preparation
-
-### 1.1 Create Virtual Environment
-
-#### 1.1.1 Create the Anaconda Virtual Environment
-
-Create virtual environment First create the Anaconda virtual environment according to the specific Python version. The Anaconda installation of PaddlePaddle supports Python version of 3.9 - 3.13.
-
-```
-conda create -n paddle_env python=YOUR_PY_VER
-```
-
-
-
-#### 1.1.2 Enter the Anaconda Virtual Environment
-
-```
-conda activate paddle_env
-```
-
-
-
-### 1.2 Confirm Other Environments
-
-Confirm that your conda virtual environment and the Python loaction which is preapared to install PaddlePaddle are where you expected them for your computer may have multiple Pythons environments. Enter Anaconda's command line terminal and enter the following command to confirm the Python location.
-
-#### 1.2.1 Confirm the installation path of python
-
-Depending on your environment, you may need to replace python3 in all command lines in the instructions with specific Python path.
-
-The command to get the Python path is:
-
-```
-which python3
-```
-
-
-#### 1.2.2 Check the version of Python
-
-Use the following command to confirm it's version
-
-```
-python3 --version
-```
-
-
-
-#### 1.2.3 Check the system environment
-
-
-Confirm that Python and pip are 64bit, and the processor architecture is arm64 (PaddlePaddle already supports Mac M), no longer supporting x86_64 architecture
-
-
-```
-python3 -c "import platform;print(platform.architecture()[0]);print(platform.machine())"
-```
-
-
-
-## INSTALLATION
-
-We will introduce conda installation here.
-
-### Add Tsinghua source (optional)
-
-For domestic users who cannot connect to the Anaconda official source, you can add Tsinghua source according to the following command.
-
-
-```
-conda config --add channels https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/free/
-```
-```
-conda config --add channels https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main/
-```
-```
-conda config --set show_channel_urls yes
-```
-
-### Install the CPU version of PaddlePaddle
-
-* Currently, only the CPU version of PaddlePaddle is supported in the macOS environment. Please use the following command to install PaddlePaddle:
-
- ```
- conda install paddlepaddle==3.0.0 -c paddle
- ```
-
-
-## Verify installation
-
-After the installation is complete, you can use `python3` to enter the Python interpreter and then use `import paddle` and `paddle.utils.run_check()`
-
-If `PaddlePaddle is installed successfully!` appears, to verify that the installation was successful.
diff --git a/docs/install/conda/windows-conda.md b/docs/install/conda/windows-conda.md
deleted file mode 100644
index 293b768b982..00000000000
--- a/docs/install/conda/windows-conda.md
+++ /dev/null
@@ -1,126 +0,0 @@
-# Windows 下的 Conda 安装
-
-[Anaconda](https://www.anaconda.com/)是一个免费开源的 Python 和 R 语言的发行版本,用于计算科学,Anaconda 致力于简化包管理和部署。Anaconda 的包使用软件包管理系统 Conda 进行管理。Conda 是一个开源包管理系统和环境管理系统,可在 Windows、macOS 和 Linux 上运行。本文档为你介绍 Anaconda 安装方式,飞桨提供的 Anaconda 安装包支持 TensorRT 推理功能。
-
-## 一、环境准备
-
-
-### 1.1 创建虚拟环境
-
-#### 1.1.1 安装环境
-
-首先根据具体的 Python 版本创建 Anaconda 虚拟环境,PaddlePaddle 的 Anaconda 安装支持 3.9 - 3.13 版本的 Python 安装环境。
-
-```
-conda create -n paddle_env python=YOUR_PY_VER
-```
-
-
-#### 1.1.2 进入 Anaconda 虚拟环境
-
-```
-activate paddle_env
-```
-
-
-
-### 1.2 其他环境检查
-
-#### 1.2.1 确认 Python 安装路径
-
-确认您的 conda 虚拟环境和需要安装 PaddlePaddle 的 Python 是您预期的位置,因为您计算机可能有多个 Python。进入 Anaconda 的命令行终端,输入以下指令确认 Python 位置。
-
-输出 Python 路径的命令为:
-
-```
-where python
-```
-
-
-根据您的环境,您可能需要将说明中所有命令行中的 python 替换为具体的 Python 路径
-
-
-
-#### 1.2.2 检查 Python 版本
-
-使用以下命令确认版本
-
-```
-python --version
-```
-
-
-
-#### 1.2.3 检查系统环境
-
-确认 Python 和 pip 是 64bit,并且处理器架构是 x86_64(或称作 x64、Intel 64、AMD64)架构。下面的第一行输出的是"64bit",第二行输出的是"x86_64(或 x64、AMD64)"即可:
-
-
-```
-python -c "import platform;print(platform.architecture()[0]);print(platform.machine())"
-```
-
-
-
-## 二、开始安装
-
-本文档为您介绍 conda 安装方式
-
-### 添加清华源(可选)
-
-对于国内用户无法连接到 Anaconda 官方源的可以按照以下命令添加清华源:
-
- ```
- conda config --add channels https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/free/
- ```
- ```
- conda config --add channels https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main/
- ```
- ```
- conda config --set show_channel_urls yes
- ```
-
-
-### 根据版本进行安装
-
-选择下面您要安装的 PaddlePaddle
-
-
-#### CPU 版的 PaddlePaddle
-
-如果您的计算机没有 NVIDIA® GPU,请安装 CPU 版的 PaddlePaddle
-
-
-```
-conda install paddlepaddle==3.0.0 -c paddle
-```
-
-
-#### GPU 版的 PaddlePaddle
-
-
-* 对于 `CUDA 11.8` 安装命令为:
-
- ```
- conda install paddlepaddle-gpu==3.0.0 paddlepaddle-cuda=11.8 -c paddle -c nvidia
- ```
-
-* 对于 `CUDA 12.6` 安装命令为:
-
- ```
- conda install paddlepaddle-gpu==3.0.0 paddlepaddle-cuda=12.6 -c paddle -c nvidia
- ```
-
-* 对于 `CUDA 12.9` 安装命令为:
-
- ```
- conda install paddlepaddle-gpu==3.0.0 paddlepaddle-cuda=12.9 -c paddle -c nvidia
- ```
-
-
-## **三、验证安装**
-
-安装完成后您可以使用 `python` 或 `python3` 进入 python 解释器,输入`import paddle` ,再输入
- `paddle.utils.run_check()`
-
-如果出现`PaddlePaddle is installed successfully!`,说明您已成功安装。
diff --git a/docs/install/conda/windows-conda_en.md b/docs/install/conda/windows-conda_en.md
deleted file mode 100644
index e594632f69d..00000000000
--- a/docs/install/conda/windows-conda_en.md
+++ /dev/null
@@ -1,132 +0,0 @@
-# Installation on Windows via Conda
-
-[Anaconda](https://www.anaconda.com/)is a free and open source distribution of Python and R for computational science. Anaconda is dedicated to simplifying package management and deployment. Anaconda's packages are managed using the package management system Conda. Conda is an open source package management system and environment management system that runs on Windows, macOS, and Linux.
-
-
-
-## Environmental preparation
-
-### 1.1 Create Virtual Environment
-
-#### 1.1.1 Create the Anaconda Virtual Environment
-
-Create virtual environment First create the Anaconda virtual environment according to the specific Python version. The Anaconda installation of PaddlePaddle supports Python version of 3.9 - 3.13.
-
-```
-conda create -n paddle_env python=YOUR_PY_VER
-```
-
-
-
-#### 1.1.2 Enter the Anaconda Virtual Environment
-
-```
-activate paddle_env
-```
-
-
-
-### 1.2 Confirm Other Environments
-
-Confirm that your conda virtual environment and the Python loaction which is preapared to install PaddlePaddle are where you expected them for your computer may have multiple Pythons environments. Enter Anaconda's command line terminal and enter the following command to confirm the Python location.
-
-#### 1.2.1 Confirm the installation path of python
-
-Depending on your environment, you may need to replace python in all command lines in the instructions with specific Python path.
-
-The command to get the Python path is:
-
-```
-where python
-```
-
-
-
-#### 1.2.2 Check the version of Python
-
-Use the following command to confirm it's version
-
-```
-python --version
-```
-
-
-
-#### 1.2.3 Check the system environment
-
-Confirm that Python and pip are 64bit, and the processor architecture is x86_64 (or x64, Intel 64, AMD64) architecture. The first line below print "64bit", the second line prints "x86_64 (or x64, AMD64)."
-
-
-```
-python -c "import platform;print(platform.architecture()[0]);print(platform.machine())"
-```
-
-
-
-
-
-## INSTALLATION
-
-We will introduce conda installation here.
-
-### Add Tsinghua source (optional)
-
-For domestic users who cannot connect to the Anaconda official source, you can add Tsinghua source according to the following command.
-
-
-```
-conda config --add channels https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/free/
-```
-```
-conda config --add channels https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main/
-```
-```
-conda config --set show_channel_urls yes
-```
-
-
-### Installation Step
-
-You can choose the following version of PaddlePaddle to start installation:
-
-
-
-#### CPU Version of PaddlePaddle
-
-
-If your computer doesn't have NVIDIA® GPU, please install `the CPU Version of PaddlePaddle`
-
-```
-conda install paddlepaddle==3.0.0 -c paddle
-```
-
-
-
-
-#### GPU Version of PaddlePaddle
-
-
-* If you are using CUDA 11.8:
-
- ```
- conda install paddlepaddle-gpu==3.0.0 paddlepaddle-cuda=11.8 -c paddle -c nvidia
- ```
-
-* If you are using CUDA 12.6:
-
- ```
- conda install paddlepaddle-gpu==3.0.0 paddlepaddle-cuda=12.6 -c paddle -c nvidia
- ```
-
-* If you are using CUDA 12.9:
-
- ```
- conda install paddlepaddle-gpu==3.0.0 paddlepaddle-cuda=12.9 -c paddle -c nvidia
- ```
-
-
-## Verify installation
-
-After the installation is complete, you can use `python` or `python3` to enter the Python interpreter and then use `import paddle` and `paddle.utils.run_check()`
-
-If `PaddlePaddle is installed successfully!` appears, to verify that the installation was successful.
diff --git a/docs/install/docker/docker_list.md b/docs/install/docker/docker_list.md
index 60179698128..496e9c222c8 100644
--- a/docs/install/docker/docker_list.md
+++ b/docs/install/docker/docker_list.md
@@ -11,7 +11,6 @@
镜像名 |
CUDA |
CUDNN |
- TRT |
NCCL |
GCC |
@@ -22,56 +21,28 @@
CPU |
|
|
- |
12.2 |
- | ccr-2vdh3abv-pub.cnc.bj.baidubce.com/paddlepaddle/paddle:latest-dev-cuda11.2-cudnn8.2-trt8.0-gcc82 |
- 11.2 |
- 8.2 |
- 8.0 |
- 2.8.4 |
- 8.2 |
-
-
- | ccr-2vdh3abv-pub.cnc.bj.baidubce.com/paddlepaddle/paddle:latest-dev-cuda11.6-cudnn8.4-trt8.4-gcc82 |
- 11.6 |
- 8.4 |
- 8.4.0.6 |
- 2.12.10 |
- 8.2 |
-
-
- | ccr-2vdh3abv-pub.cnc.bj.baidubce.com/paddlepaddle/paddle:latest-dev-cuda11.7-cudnn8.4-trt8.4-gcc82 |
- 11.7 |
- 8.4 |
- 8.4.2.4 |
- 2.13.4 |
- 8.2 |
-
-
- | ccr-2vdh3abv-pub.cnc.bj.baidubce.com/paddlepaddle/paddle:latest-dev-cuda11.8-cudnn8.6-trt8.5-gcc82 |
+ ccr-2vdh3abv-pub.cnc.bj.baidubce.com/paddlepaddle/paddle:cuda118-dev |
11.8 |
- 8.6 |
- 8.5 |
+ 8.9 |
2.15.5 |
- 8.2 |
+ 11.4 |
- | ccr-2vdh3abv-pub.cnc.bj.baidubce.com/paddlepaddle/paddle:latest-dev-cuda12.0-cudnn8.9-trt8.6-gcc12.2 |
- 12.0 |
- 8.9 |
- 8.6 |
- 2.17.1 |
- 12.2 |
+ ccr-2vdh3abv-pub.cnc.bj.baidubce.com/paddlepaddle/paddle:cuda126-dev |
+ 12.6 |
+ 9.5 |
+ 2.23.4 |
+ 11.4 |
- | ccr-2vdh3abv-pub.cnc.bj.baidubce.com/paddlepaddle/paddle:latest-dev-cuda12.3-cudnn9.0-trt8.6-gcc12.2 |
- 12.3 |
- 9.0 |
- 8.6 |
- 2.17.1 |
- 12.2 |
+ ccr-2vdh3abv-pub.cnc.bj.baidubce.com/paddlepaddle/paddle:cuda129-dev |
+ 12.9 |
+ 9.9 |
+ 2.26.5 |
+ 11.4 |
diff --git a/docs/install/docker/docker_list_en.md b/docs/install/docker/docker_list_en.md
index ac4eb66e8f5..a1afba8b9f9 100644
--- a/docs/install/docker/docker_list_en.md
+++ b/docs/install/docker/docker_list_en.md
@@ -11,7 +11,6 @@ This document introduces the Docker environment commonly used by PaddlePaddle
Images |
CUDA |
CUDNN |
- TRT |
NCCL |
GCC |
@@ -22,56 +21,28 @@ This document introduces the Docker environment commonly used by PaddlePaddle
CPU |
|
|
- |
12.2 |
- | ccr-2vdh3abv-pub.cnc.bj.baidubce.com/paddlepaddle/paddle:latest-dev-cuda11.2-cudnn8.2-trt8.0-gcc82 |
- 11.2 |
- 8.2 |
- 8.0 |
- 2.8.4 |
- 8.2 |
-
-
- | ccr-2vdh3abv-pub.cnc.bj.baidubce.com/paddlepaddle/paddle:latest-dev-cuda11.6-cudnn8.4-trt8.4-gcc82 |
- 11.6 |
- 8.4 |
- 8.4.0.6 |
- 2.12.10 |
- 8.2 |
-
-
- | ccr-2vdh3abv-pub.cnc.bj.baidubce.com/paddlepaddle/paddle:latest-dev-cuda11.7-cudnn8.4-trt8.4-gcc82 |
- 11.7 |
- 8.4 |
- 8.4.2.4 |
- 2.13.4 |
- 8.2 |
-
-
- | ccr-2vdh3abv-pub.cnc.bj.baidubce.com/paddlepaddle/paddle:latest-dev-cuda11.8-cudnn8.6-trt8.5-gcc82 |
+ ccr-2vdh3abv-pub.cnc.bj.baidubce.com/paddlepaddle/paddle:cuda118-dev |
11.8 |
- 8.6 |
- 8.5 |
+ 8.9 |
2.15.5 |
- 8.2 |
+ 11.4 |
- | ccr-2vdh3abv-pub.cnc.bj.baidubce.com/paddlepaddle/paddle:latest-dev-cuda12.0-cudnn8.9-trt8.6-gcc12.2 |
- 12.0 |
- 8.9 |
- 8.6 |
- 2.17.1 |
- 12.2 |
+ ccr-2vdh3abv-pub.cnc.bj.baidubce.com/paddlepaddle/paddle:cuda126-dev |
+ 12.6 |
+ 9.5 |
+ 2.23.4 |
+ 11.4 |
- | ccr-2vdh3abv-pub.cnc.bj.baidubce.com/paddlepaddle/paddle:latest-dev-cuda12.3-cudnn9.0-trt8.6-gcc12.2 |
- 12.3 |
- 9.0 |
- 8.6 |
- 2.17.1 |
- 12.2 |
+ ccr-2vdh3abv-pub.cnc.bj.baidubce.com/paddlepaddle/paddle:cuda129-dev |
+ 12.9 |
+ 9.9 |
+ 2.26.5 |
+ 11.4 |
diff --git a/docs/install/docker/linux-docker.md b/docs/install/docker/linux-docker.md
index e5f644d48f4..c42fcabdae2 100644
--- a/docs/install/docker/linux-docker.md
+++ b/docs/install/docker/linux-docker.md
@@ -21,46 +21,46 @@
* CPU 版的 PaddlePaddle:
```
- docker pull ccr-2vdh3abv-pub.cnc.bj.baidubce.com/paddlepaddle/paddle:3.1.0
+ docker pull ccr-2vdh3abv-pub.cnc.bj.baidubce.com/paddlepaddle/paddle:3.1.1
```
* CPU 版的 PaddlePaddle,且镜像中预装好了 jupyter:
```
- docker pull ccr-2vdh3abv-pub.cnc.bj.baidubce.com/paddlepaddle/paddle:3.1.0-jupyter
+ docker pull ccr-2vdh3abv-pub.cnc.bj.baidubce.com/paddlepaddle/paddle:3.1.1-jupyter
```
* GPU 版的 PaddlePaddle(**建议拉取最新版本镜像,并确保已经成功安装 NVIDIA Container Toolkit**):
```
- docker pull ccr-2vdh3abv-pub.cnc.bj.baidubce.com/paddlepaddle/paddle:3.1.0-gpu-cuda11.8-cudnn8.9-trt8.6
+ docker pull ccr-2vdh3abv-pub.cnc.bj.baidubce.com/paddlepaddle/paddle:3.1.1-gpu-cuda11.8-cudnn8.9
```
```
- docker pull ccr-2vdh3abv-pub.cnc.bj.baidubce.com/paddlepaddle/paddle:3.1.0-gpu-cuda12.6-cudnn9.5-trt10.5
+ docker pull ccr-2vdh3abv-pub.cnc.bj.baidubce.com/paddlepaddle/paddle:3.1.1-gpu-cuda12.6-cudnn9.5
```
```
- docker pull ccr-2vdh3abv-pub.cnc.bj.baidubce.com/paddlepaddle/paddle:3.1.0-gpu-cuda12.9-cudnn9.9-trt10.5
+ docker pull ccr-2vdh3abv-pub.cnc.bj.baidubce.com/paddlepaddle/paddle:3.1.1-gpu-cuda12.9-cudnn9.9
```
如果您的机器不在中国大陆地区,可以直接从 DockerHub 拉取镜像:
* CPU 版的 PaddlePaddle:
```
- docker pull paddlepaddle/paddle:3.1.0
+ docker pull paddlepaddle/paddle:3.1.1
```
* CPU 版的 PaddlePaddle,且镜像中预装好了 jupyter:
```
- docker pull paddlepaddle/paddle:3.1.0-jupyter
+ docker pull paddlepaddle/paddle:3.1.1-jupyter
```
* GPU 版的 PaddlePaddle(**建议拉取最新版本镜像,并确保已经成功安装 NVIDIA Container Toolkit**):
```
- docker pull paddlepaddle/paddle:3.1.0-gpu-cuda11.8-cudnn8.9-trt8.6
+ docker pull paddlepaddle/paddle:3.1.1-gpu-cuda11.8-cudnn8.9
```
```
- docker pull paddlepaddle/paddle:3.1.0-gpu-cuda12.6-cudnn9.5-trt10.5
+ docker pull paddlepaddle/paddle:3.1.1-gpu-cuda12.6-cudnn9.5
```
```
- docker pull paddlepaddle/paddle:3.1.0-gpu-cuda12.9-cudnn9.9-trt10.5
+ docker pull paddlepaddle/paddle:3.1.1-gpu-cuda12.9-cudnn9.9
```
您还可以访问[DockerHub](https://hub.docker.com/r/paddlepaddle/paddle/tags/)获取更多镜像。
@@ -72,7 +72,7 @@
```
- docker run --name paddle_docker -it -v $PWD:/paddle ccr-2vdh3abv-pub.cnc.bj.baidubce.com/paddlepaddle/paddle:3.1.0 /bin/bash
+ docker run --name paddle_docker -it -v $PWD:/paddle ccr-2vdh3abv-pub.cnc.bj.baidubce.com/paddlepaddle/paddle:3.1.1 /bin/bash
```
- `--name paddle_docker`:设定 Docker 的名称,`paddle_docker` 是自己设置的名称;
@@ -83,7 +83,7 @@
- `-v $PWD:/paddle`:指定将当前路径(PWD 变量会展开为当前路径的绝对路径)挂载到容器内部的 /paddle 目录;
- - `ccr-2vdh3abv-pub.cnc.bj.baidubce.com/paddlepaddle/paddle:3.1.0`:指定需要使用的 image 名称,您可以通过`docker images`命令查看;/bin/bash 是在 Docker 中要执行的命令
+ - `ccr-2vdh3abv-pub.cnc.bj.baidubce.com/paddlepaddle/paddle:3.1.1`:指定需要使用的 image 名称,您可以通过`docker images`命令查看;/bin/bash 是在 Docker 中要执行的命令
* 使用 CPU 版本的 PaddlePaddle,且镜像中预装好了 jupyter:
@@ -98,7 +98,7 @@
cd ./jupyter_docker
```
```
- docker run -p 80:80 --rm --env USER_PASSWD="password you set" -v $PWD:/home/paddle ccr-2vdh3abv-pub.cnc.bj.baidubce.com/paddlepaddle/paddle:3.1.0-jupyter
+ docker run -p 80:80 --rm --env USER_PASSWD="password you set" -v $PWD:/home/paddle ccr-2vdh3abv-pub.cnc.bj.baidubce.com/paddlepaddle/paddle:3.1.1-jupyter
```
- `--rm`:关闭容器后删除容器;
@@ -109,13 +109,13 @@
- `-v $PWD:/home/paddle`:指定将当前路径(PWD 变量会展开为当前路径的绝对路径)挂载到容器内部的 /home/paddle 目录;
- - `ccr-2vdh3abv-pub.cnc.bj.baidubce.com/paddlepaddle/paddle:3.1.0-jupyter`:指定需要使用的 image 名称,您可以通过`docker images`命令查看
+ - `ccr-2vdh3abv-pub.cnc.bj.baidubce.com/paddlepaddle/paddle:3.1.1-jupyter`:指定需要使用的 image 名称,您可以通过`docker images`命令查看
* 使用 GPU 版本的 PaddlePaddle:
```
- docker run --gpus all --name paddle_docker -v $PWD:/paddle --network=host -it ccr-2vdh3abv-pub.cnc.bj.baidubce.com/paddlepaddle/paddle:3.1.0-gpu-cuda12.6-cudnn9.5-trt10.5 /bin/bash
+ docker run --gpus all --name paddle_docker -v $PWD:/paddle --network=host -it ccr-2vdh3abv-pub.cnc.bj.baidubce.com/paddlepaddle/paddle:3.1.1-gpu-cuda12.6-cudnn9.5 /bin/bash
```
- `--gpus all`: 在 Docker 容器中允许使用 gpu;
@@ -127,7 +127,7 @@
- `-it`: 与宿主机保持交互状态;
- - `ccr-2vdh3abv-pub.cnc.bj.baidubce.com/paddlepaddle/paddle:3.1.0-gpu-cuda12.6-cudnn9.5-trt10.5`:使用名为`ccr-2vdh3abv-pub.cnc.bj.baidubce.com/paddlepaddle/paddle`, tag 为`3.1.0-gpu-cuda12.6-cudnn9.5-trt10.5`的镜像创建 Docker 容器,/bin/bash 进入容器后启动/bin/bash 命令。
+ - `ccr-2vdh3abv-pub.cnc.bj.baidubce.com/paddlepaddle/paddle:3.1.1-gpu-cuda12.6-cudnn9.5`:使用名为`ccr-2vdh3abv-pub.cnc.bj.baidubce.com/paddlepaddle/paddle`, tag 为`3.1.1-gpu-cuda12.6-cudnn9.5`的镜像创建 Docker 容器,/bin/bash 进入容器后启动/bin/bash 命令。
@@ -146,24 +146,24 @@
- | ccr-2vdh3abv-pub.cnc.bj.baidubce.com/paddlepaddle/paddle:3.1.0 |
- 安装了 3.1.0 版本 paddle 的 CPU 镜像 |
+ ccr-2vdh3abv-pub.cnc.bj.baidubce.com/paddlepaddle/paddle:3.1.1 |
+ 安装了 3.1.1 版本 paddle 的 CPU 镜像 |
- | ccr-2vdh3abv-pub.cnc.bj.baidubce.com/paddlepaddle/paddle:3.1.0-jupyter |
- 安装了 3.1.0 版本 paddle 的 CPU 镜像,且镜像中预装好了 jupyter,启动 docker 即运行 jupyter 服务 |
+ ccr-2vdh3abv-pub.cnc.bj.baidubce.com/paddlepaddle/paddle:3.1.1-jupyter |
+ 安装了 3.1.1 版本 paddle 的 CPU 镜像,且镜像中预装好了 jupyter,启动 docker 即运行 jupyter 服务 |
- | ccr-2vdh3abv-pub.cnc.bj.baidubce.com/paddlepaddle/paddle:3.1.0-gpu-cuda11.8-cudnn8.9-trt8.6 |
- 安装了 3.1.0 版本 paddle 的 GPU 镜像,cuda 版本为 11.8,cudnn 版本为 8.9,trt 版本为 8.6 |
+ ccr-2vdh3abv-pub.cnc.bj.baidubce.com/paddlepaddle/paddle:3.1.1-gpu-cuda11.8-cudnn8.9 |
+ 安装了 3.1.1 版本 paddle 的 GPU 镜像,cuda 版本为 11.8,cudnn 版本为 8.9 |
- | ccr-2vdh3abv-pub.cnc.bj.baidubce.com/paddlepaddle/paddle:3.1.0-gpu-cuda12.6-cudnn9.5-trt10.5 |
- 安装了 3.1.0 版本 paddle 的 GPU 镜像,cuda 版本为 12.6,cudnn 版本为 9.5,trt 版本为 10.5 |
+ ccr-2vdh3abv-pub.cnc.bj.baidubce.com/paddlepaddle/paddle:3.1.1-gpu-cuda12.6-cudnn9.5 |
+ 安装了 3.1.1 版本 paddle 的 GPU 镜像,cuda 版本为 12.6,cudnn 版本为 9.5 |
- | ccr-2vdh3abv-pub.cnc.bj.baidubce.com/paddlepaddle/paddle:3.1.0-gpu-cuda12.9-cudnn9.9-trt10.5 |
- 安装了 3.1.0 版本 paddle 的 GPU 镜像,cuda 版本为 12.9,cudnn 版本为 9.9,trt 版本为 10.5 |
+ ccr-2vdh3abv-pub.cnc.bj.baidubce.com/paddlepaddle/paddle:3.1.1-gpu-cuda12.9-cudnn9.9 |
+ 安装了 3.1.1 版本 paddle 的 GPU 镜像,cuda 版本为 12.9,cudnn 版本为 9.9 |
diff --git a/docs/install/docker/linux-docker_en.md b/docs/install/docker/linux-docker_en.md
index 1976dbb5489..e9d16e202a1 100644
--- a/docs/install/docker/linux-docker_en.md
+++ b/docs/install/docker/linux-docker_en.md
@@ -21,46 +21,46 @@ For domestic users, when downloading docker is slow due to network problems, you
* CPU version of PaddlePaddle:
```
- docker pull ccr-2vdh3abv-pub.cnc.bj.baidubce.com/paddlepaddle/paddle:3.1.0
+ docker pull ccr-2vdh3abv-pub.cnc.bj.baidubce.com/paddlepaddle/paddle:3.1.1
```
* CPU version of PaddlePaddle, and the image is pre-installed with jupyter:
```
- docker pull ccr-2vdh3abv-pub.cnc.bj.baidubce.com/paddlepaddle/paddle:3.1.0-jupyter
+ docker pull ccr-2vdh3abv-pub.cnc.bj.baidubce.com/paddlepaddle/paddle:3.1.1-jupyter
```
* GPU version of PaddlePaddle(**Latest version of gpu image is recommended, and make sure NVIDIA Container Toolkit is installed successfully**):
```
- docker pull ccr-2vdh3abv-pub.cnc.bj.baidubce.com/paddlepaddle/paddle:3.1.0-gpu-cuda11.8-cudnn8.9-trt8.6
+ docker pull ccr-2vdh3abv-pub.cnc.bj.baidubce.com/paddlepaddle/paddle:3.1.1-gpu-cuda11.8-cudnn8.9
```
```
- docker pull ccr-2vdh3abv-pub.cnc.bj.baidubce.com/paddlepaddle/paddle:3.1.0-gpu-cuda12.6-cudnn9.5-trt10.5
+ docker pull ccr-2vdh3abv-pub.cnc.bj.baidubce.com/paddlepaddle/paddle:3.1.1-gpu-cuda12.6-cudnn9.5
```
```
- docker pull ccr-2vdh3abv-pub.cnc.bj.baidubce.com/paddlepaddle/paddle:3.1.0-gpu-cuda12.9-cudnn9.9-trt10.5
+ docker pull ccr-2vdh3abv-pub.cnc.bj.baidubce.com/paddlepaddle/paddle:3.1.1-gpu-cuda12.9-cudnn9.9
```
If your machine is not in mainland China, you can pull the image directly from DockerHub:
* CPU version of PaddlePaddle:
```
- docker pull paddlepaddle/paddle:3.1.0
+ docker pull paddlepaddle/paddle:3.1.1
```
* CPU version of PaddlePaddle, and the image is pre-installed with jupyter:
```
- docker pull paddlepaddle/paddle:3.1.0-jupyter
+ docker pull paddlepaddle/paddle:3.1.1-jupyter
```
* GPU version of PaddlePaddle(**Latest version of gpu image is recommended, and make sure NVIDIA Container Toolkit is installed successfully**):
```
- docker pull paddlepaddle/paddle:3.1.0-gpu-cuda11.8-cudnn8.9-trt8.6
+ docker pull paddlepaddle/paddle:3.1.1-gpu-cuda11.8-cudnn8.9
```
```
- docker pull paddlepaddle/paddle:3.1.0-gpu-cuda12.6-cudnn9.5-trt10.5
+ docker pull paddlepaddle/paddle:3.1.1-gpu-cuda12.6-cudnn9.5
```
```
- docker pull paddlepaddle/paddle:3.1.0-gpu-cuda12.9-cudnn9.9-trt10.5
+ docker pull paddlepaddle/paddle:3.1.1-gpu-cuda12.9-cudnn9.9
```
You can see [DockerHub](https://hub.docker.com/r/paddlepaddle/paddle/tags/) to get more images.
@@ -72,7 +72,7 @@ You can see [DockerHub](https://hub.docker.com/r/paddlepaddle/paddle/tags/) to g
```
- docker run --name paddle_docker -it -v $PWD:/paddle ccr-2vdh3abv-pub.cnc.bj.baidubce.com/paddlepaddle/paddle:3.1.0 /bin/bash
+ docker run --name paddle_docker -it -v $PWD:/paddle ccr-2vdh3abv-pub.cnc.bj.baidubce.com/paddlepaddle/paddle:3.1.1 /bin/bash
```
- `--name paddle_docker`: set name of Docker, `paddle_docker` is name of docker you set;
@@ -83,7 +83,7 @@ You can see [DockerHub](https://hub.docker.com/r/paddlepaddle/paddle/tags/) to g
- `-v $PWD:/paddle`: Specifies to mount the current path of the host (PWD variable in Linux will expand to the absolute path of the current path) to the /paddle directory inside the container;
- - `ccr-2vdh3abv-pub.cnc.bj.baidubce.com/paddlepaddle/paddle:3.1.0`: Specify the name of the image to be used. You can view it through the 'docker images' command. /bin/Bash is the command to be executed in Docker
+ - `ccr-2vdh3abv-pub.cnc.bj.baidubce.com/paddlepaddle/paddle:3.1.1`: Specify the name of the image to be used. You can view it through the 'docker images' command. /bin/Bash is the command to be executed in Docker
* Use GPU version of PaddlePaddle:
@@ -91,7 +91,7 @@ You can see [DockerHub](https://hub.docker.com/r/paddlepaddle/paddle/tags/) to g
```
- docker run --gpus all --name paddle_docker -v $PWD:/paddle --network=host -it ccr-2vdh3abv-pub.cnc.bj.baidubce.com/paddlepaddle/paddle:3.1.0-gpu-cuda12.6-cudnn9.5-trt10.5 /bin/bash
+ docker run --gpus all --name paddle_docker -v $PWD:/paddle --network=host -it ccr-2vdh3abv-pub.cnc.bj.baidubce.com/paddlepaddle/paddle:3.1.1-gpu-cuda12.6-cudnn9.5 /bin/bash
```
- `--gpus all`: gpu resources can be used in Docker container;
@@ -104,7 +104,7 @@ You can see [DockerHub](https://hub.docker.com/r/paddlepaddle/paddle/tags/) to g
- `-v $PWD:/paddle`: Specifies to mount the current path of the host (PWD variable in Linux will expand to the absolute path of the current path) to the /paddle directory inside the container;
- - `ccr-2vdh3abv-pub.cnc.bj.baidubce.com/paddlepaddle/paddle:3.1.0-gpu-cuda12.6-cudnn9.5-trt10.5`: Specify the name of the image to be used. You can view it through the 'docker images' command. /bin/Bash is the command to be executed in Docker
+ - `ccr-2vdh3abv-pub.cnc.bj.baidubce.com/paddlepaddle/paddle:3.1.1-gpu-cuda12.6-cudnn9.5`: Specify the name of the image to be used. You can view it through the 'docker images' command. /bin/Bash is the command to be executed in Docker
* Use CPU version of PaddlePaddle with jupyter:
@@ -120,7 +120,7 @@ You can see [DockerHub](https://hub.docker.com/r/paddlepaddle/paddle/tags/) to g
cd ./jupyter_docker
```
```
- docker run -p 80:80 --rm --env USER_PASSWD="password you set" -v $PWD:/home/paddle ccr-2vdh3abv-pub.cnc.bj.baidubce.com/paddlepaddle/paddle:3.1.0-jupyter
+ docker run -p 80:80 --rm --env USER_PASSWD="password you set" -v $PWD:/home/paddle ccr-2vdh3abv-pub.cnc.bj.baidubce.com/paddlepaddle/paddle:3.1.1-jupyter
```
- `--rm`: Delete the container after closing it;
@@ -131,7 +131,7 @@ You can see [DockerHub](https://hub.docker.com/r/paddlepaddle/paddle/tags/) to g
- `-v $PWD:/home/paddle`: Specifies to mount the current path (the PWD variable will be expanded to the absolute path of the current path) to the /home/paddle directory inside the container;
- - `ccr-2vdh3abv-pub.cnc.bj.baidubce.com/paddlepaddle/paddle:3.1.0-jupyter`: Specify the name of the image to be used, you can view it through the `docker images` command
+ - `ccr-2vdh3abv-pub.cnc.bj.baidubce.com/paddlepaddle/paddle:3.1.1-jupyter`: Specify the name of the image to be used, you can view it through the `docker images` command
Now you have successfully used Docker to install PaddlePaddle. For more information about using Docker, see[Docker official documents](https://docs.docker.com)
@@ -149,24 +149,24 @@ Now you have successfully used Docker to install PaddlePaddle. For more informat
- | ccr-2vdh3abv-pub.cnc.bj.baidubce.com/paddlepaddle/paddle:3.1.0 |
- CPU image with 3.1.0 version of paddle installed |
+ ccr-2vdh3abv-pub.cnc.bj.baidubce.com/paddlepaddle/paddle:3.1.1 |
+ CPU image with 3.1.1 version of paddle installed |
- | ccr-2vdh3abv-pub.cnc.bj.baidubce.com/paddlepaddle/paddle:3.1.0-jupyter |
- CPU image of paddle version 3.1.0 is installed, and jupyter is pre-installed in the image. Start the docker to run the jupyter service |
+ ccr-2vdh3abv-pub.cnc.bj.baidubce.com/paddlepaddle/paddle:3.1.1-jupyter |
+ CPU image of paddle version 3.1.1 is installed, and jupyter is pre-installed in the image. Start the docker to run the jupyter service |
- | ccr-2vdh3abv-pub.cnc.bj.baidubce.com/paddlepaddle/paddle:3.1.0-gpu-cuda11.8-cudnn8.9-trt8.6 |
- GPU image of paddle version 3.1.0 is installed, cuda version is 11.8, cudnn version is 8.9, trt version is 8.6 |
+ ccr-2vdh3abv-pub.cnc.bj.baidubce.com/paddlepaddle/paddle:3.1.1-gpu-cuda11.8-cudnn8.9 |
+ GPU image of paddle version 3.1.1 is installed, cuda version is 11.8, cudnn version is 8.9 |
- | ccr-2vdh3abv-pub.cnc.bj.baidubce.com/paddlepaddle/paddle:3.1.0-gpu-cuda12.6-cudnn9.5-trt10.5 |
- GPU image of paddle version 3.1.0 is installed, cuda version is 12.6, cudnn version is 9.5, trt version is 10.5 |
+ ccr-2vdh3abv-pub.cnc.bj.baidubce.com/paddlepaddle/paddle:3.1.1-gpu-cuda12.6-cudnn9.5 |
+ GPU image of paddle version 3.1.1 is installed, cuda version is 12.6, cudnn version is 9.5 |
- | ccr-2vdh3abv-pub.cnc.bj.baidubce.com/paddlepaddle/paddle:3.1.0-gpu-cuda12.9-cudnn9.9-trt10.5 |
- GPU image of paddle version 3.1.0 is installed, cuda version is 12.9, cudnn version is 9.9, trt version is 10.5 |
+ ccr-2vdh3abv-pub.cnc.bj.baidubce.com/paddlepaddle/paddle:3.1.1-gpu-cuda12.9-cudnn9.9 |
+ GPU image of paddle version 3.1.1 is installed, cuda version is 12.9, cudnn version is 9.9 |
diff --git a/docs/install/docker/macos-docker.md b/docs/install/docker/macos-docker.md
index 0a98a031735..d7cbbb6e879 100644
--- a/docs/install/docker/macos-docker.md
+++ b/docs/install/docker/macos-docker.md
@@ -19,24 +19,24 @@
* CPU 版的 PaddlePaddle:
```
- docker pull ccr-2vdh3abv-pub.cnc.bj.baidubce.com/paddlepaddle/paddle:3.1.0
+ docker pull ccr-2vdh3abv-pub.cnc.bj.baidubce.com/paddlepaddle/paddle:3.1.1
```
* CPU 版的 PaddlePaddle,且镜像中预装好了 jupyter:
```
- docker pull ccr-2vdh3abv-pub.cnc.bj.baidubce.com/paddlepaddle/paddle:3.1.0-jupyter
+ docker pull ccr-2vdh3abv-pub.cnc.bj.baidubce.com/paddlepaddle/paddle:3.1.1-jupyter
```
如果您的机器不在中国大陆地区,可以直接从 DockerHub 拉取镜像:
* CPU 版的 PaddlePaddle:
```
- docker pull paddlepaddle/paddle:3.1.0
+ docker pull paddlepaddle/paddle:3.1.1
```
* CPU 版的 PaddlePaddle,且镜像中预装好了 jupyter:
```
- docker pull paddlepaddle/paddle:3.1.0-jupyter
+ docker pull paddlepaddle/paddle:3.1.1-jupyter
```
您还可以访问[DockerHub](https://hub.docker.com/r/paddlepaddle/paddle/tags/)获取更多镜像。
@@ -48,7 +48,7 @@
```
- docker run --name paddle_docker -it -v $PWD:/paddle ccr-2vdh3abv-pub.cnc.bj.baidubce.com/paddlepaddle/paddle:3.1.0 /bin/bash
+ docker run --name paddle_docker -it -v $PWD:/paddle ccr-2vdh3abv-pub.cnc.bj.baidubce.com/paddlepaddle/paddle:3.1.1 /bin/bash
```
- `--name paddle_docker`:设定 Docker 的名称,`paddle_docker` 是自己设置的名称;
@@ -59,7 +59,7 @@
- `-v $PWD:/paddle`:指定将当前路径(PWD 变量会展开为当前路径的绝对路径)挂载到容器内部的 /paddle 目录;
- - `ccr-2vdh3abv-pub.cnc.bj.baidubce.com/paddlepaddle/paddle:3.1.0`:指定需要使用的 image 名称,您可以通过`docker images`命令查看;/bin/bash 是在 Docker 中要执行的命令
+ - `ccr-2vdh3abv-pub.cnc.bj.baidubce.com/paddlepaddle/paddle:3.1.1`:指定需要使用的 image 名称,您可以通过`docker images`命令查看;/bin/bash 是在 Docker 中要执行的命令
* 使用 CPU 版本的 PaddlePaddle,且镜像中预装好了 jupyter:
@@ -73,7 +73,7 @@
cd ./jupyter_docker
```
```
- docker run -p 80:80 --rm --env USER_PASSWD="password you set" -v $PWD:/home/paddle ccr-2vdh3abv-pub.cnc.bj.baidubce.com/paddlepaddle/paddle:3.1.0-jupyter
+ docker run -p 80:80 --rm --env USER_PASSWD="password you set" -v $PWD:/home/paddle ccr-2vdh3abv-pub.cnc.bj.baidubce.com/paddlepaddle/paddle:3.1.1-jupyter
```
- `--rm`:关闭容器后删除容器;
@@ -84,7 +84,7 @@
- `-v $PWD:/home/paddle`:指定将当前路径(PWD 变量会展开为当前路径的绝对路径)挂载到容器内部的 /home/paddle 目录;
- - `ccr-2vdh3abv-pub.cnc.bj.baidubce.com/paddlepaddle/paddle:3.1.0-jupyter`:指定需要使用的 image 名称,您可以通过`docker images`命令查看
+ - `ccr-2vdh3abv-pub.cnc.bj.baidubce.com/paddlepaddle/paddle:3.1.1-jupyter`:指定需要使用的 image 名称,您可以通过`docker images`命令查看
@@ -104,12 +104,12 @@
- | ccr-2vdh3abv-pub.cnc.bj.baidubce.com/paddlepaddle/paddle:3.1.0 |
- 安装了 3.1.0 版本 paddle 的 CPU 镜像 |
+ ccr-2vdh3abv-pub.cnc.bj.baidubce.com/paddlepaddle/paddle:3.1.1 |
+ 安装了 3.1.1 版本 paddle 的 CPU 镜像 |
- | ccr-2vdh3abv-pub.cnc.bj.baidubce.com/paddlepaddle/paddle:3.1.0-jupyter |
- 安装了 3.1.0 版本 paddle 的 CPU 镜像,且镜像中预装好了 jupyter,启动 docker 即运行 jupyter 服务 |
+ ccr-2vdh3abv-pub.cnc.bj.baidubce.com/paddlepaddle/paddle:3.1.1-jupyter |
+ 安装了 3.1.1 版本 paddle 的 CPU 镜像,且镜像中预装好了 jupyter,启动 docker 即运行 jupyter 服务 |
diff --git a/docs/install/docker/macos-docker_en.md b/docs/install/docker/macos-docker_en.md
index ac129bbe2ea..1cc4510d18b 100644
--- a/docs/install/docker/macos-docker_en.md
+++ b/docs/install/docker/macos-docker_en.md
@@ -19,24 +19,24 @@ For domestic users, when downloading docker is slow due to network problems, you
* CPU version of PaddlePaddle:
```
- docker pull ccr-2vdh3abv-pub.cnc.bj.baidubce.com/paddlepaddle/paddle:3.1.0
+ docker pull ccr-2vdh3abv-pub.cnc.bj.baidubce.com/paddlepaddle/paddle:3.1.1
```
* CPU version of PaddlePaddle, and the image is pre-installed with jupyter:
```
- docker pull ccr-2vdh3abv-pub.cnc.bj.baidubce.com/paddlepaddle/paddle:3.1.0-jupyter
+ docker pull ccr-2vdh3abv-pub.cnc.bj.baidubce.com/paddlepaddle/paddle:3.1.1-jupyter
```
If your machine is not in mainland China, you can pull the image directly from DockerHub:
* CPU version of PaddlePaddle:
```
- docker pull paddlepaddle/paddle:3.1.0
+ docker pull paddlepaddle/paddle:3.1.1
```
* CPU version of PaddlePaddle, and the image is pre-installed with jupyter:
```
- docker pull paddlepaddle/paddle:3.1.0-jupyter
+ docker pull paddlepaddle/paddle:3.1.1-jupyter
```
You can see [DockerHub](https://hub.docker.com/r/paddlepaddle/paddle/tags/) to get more images.
@@ -48,7 +48,7 @@ You can see [DockerHub](https://hub.docker.com/r/paddlepaddle/paddle/tags/) to g
```
- docker run --name paddle_docker -it -v $PWD:/paddle ccr-2vdh3abv-pub.cnc.bj.baidubce.com/paddlepaddle/paddle:3.1.0 /bin/bash
+ docker run --name paddle_docker -it -v $PWD:/paddle ccr-2vdh3abv-pub.cnc.bj.baidubce.com/paddlepaddle/paddle:3.1.1 /bin/bash
```
- `--name paddle_docker`: set name of Docker, `paddle_docker` is name of docker you set;
@@ -59,7 +59,7 @@ You can see [DockerHub](https://hub.docker.com/r/paddlepaddle/paddle/tags/) to g
- `-v $PWD:/paddle`: Specifies to mount the current path of the host (PWD variable in Linux will expand to the absolute path of the current path) to the /paddle directory inside the container;
- - `ccr-2vdh3abv-pub.cnc.bj.baidubce.com/paddlepaddle/paddle:3.1.0`: Specify the name of the image to be used. You can view it through the 'docker images' command. /bin/Bash is the command to be executed in Docker
+ - `ccr-2vdh3abv-pub.cnc.bj.baidubce.com/paddlepaddle/paddle:3.1.1`: Specify the name of the image to be used. You can view it through the 'docker images' command. /bin/Bash is the command to be executed in Docker
* Use CPU version of PaddlePaddle with jupyter:
@@ -75,7 +75,7 @@ You can see [DockerHub](https://hub.docker.com/r/paddlepaddle/paddle/tags/) to g
cd ./jupyter_docker
```
```
- docker run -p 80:80 --rm --env USER_PASSWD="password you set" -v $PWD:/home/paddle ccr-2vdh3abv-pub.cnc.bj.baidubce.com/paddlepaddle/paddle:3.1.0-jupyter
+ docker run -p 80:80 --rm --env USER_PASSWD="password you set" -v $PWD:/home/paddle ccr-2vdh3abv-pub.cnc.bj.baidubce.com/paddlepaddle/paddle:3.1.1-jupyter
```
- `--rm`: Delete the container after closing it;
@@ -86,7 +86,7 @@ You can see [DockerHub](https://hub.docker.com/r/paddlepaddle/paddle/tags/) to g
- `-v $PWD:/home/paddle`: Specifies to mount the current path (the PWD variable will be expanded to the absolute path of the current path) to the /home/paddle directory inside the container;
- - `ccr-2vdh3abv-pub.cnc.bj.baidubce.com/paddlepaddle/paddle:3.1.0-jupyter`: Specify the name of the image to be used, you can view it through the `docker images` command
+ - `ccr-2vdh3abv-pub.cnc.bj.baidubce.com/paddlepaddle/paddle:3.1.1-jupyter`: Specify the name of the image to be used, you can view it through the `docker images` command
@@ -105,12 +105,12 @@ Now you have successfully used Docker to install PaddlePaddle. For more informat
- | ccr-2vdh3abv-pub.cnc.bj.baidubce.com/paddlepaddle/paddle:3.1.0 |
- CPU image with 3.1.0 version of paddle installed |
+ ccr-2vdh3abv-pub.cnc.bj.baidubce.com/paddlepaddle/paddle:3.1.1 |
+ CPU image with 3.1.1 version of paddle installed |
- | ccr-2vdh3abv-pub.cnc.bj.baidubce.com/paddlepaddle/paddle:3.1.0-jupyter |
- CPU image of paddle version 3.1.0 is installed, and jupyter is pre-installed in the image. Start the docker to run the jupyter service |
+ ccr-2vdh3abv-pub.cnc.bj.baidubce.com/paddlepaddle/paddle:3.1.1-jupyter |
+ CPU image of paddle version 3.1.1 is installed, and jupyter is pre-installed in the image. Start the docker to run the jupyter service |
diff --git a/docs/install/index_cn.rst b/docs/install/index_cn.rst
index 69ae34f9666..b030bfeebcc 100644
--- a/docs/install/index_cn.rst
+++ b/docs/install/index_cn.rst
@@ -39,7 +39,7 @@
**第一种安装方式:使用 pip 安装**
-您可以选择“使用 pip 安装”、“使用 conda 安装”、“使用 docker 安装”、“从源码编译安装” 四种方式中的任意一种方式进行安装。
+您可以选择“使用 pip 安装”、“使用 docker 安装”、“从源码编译安装” 四种方式中的任意一种方式进行安装。
本节将介绍使用 pip 的安装方式。
@@ -140,7 +140,6 @@
:hidden:
pip/frompip.rst
- conda/fromconda.rst
docker/fromdocker.rst
compile/fromsource.rst
install_xpu_cn.md
diff --git a/docs/install/index_en.rst b/docs/install/index_en.rst
index 69fa8f1b188..48d47ebef21 100644
--- a/docs/install/index_en.rst
+++ b/docs/install/index_en.rst
@@ -42,7 +42,7 @@ The manuals will guide you to build and install PaddlePaddle on your 64-bit desk
* Python and pip requires 64-bit
**First Installation Method: Using pip for installation**
-You can choose any of the four methods: "Using pip for installation", "Using conda for installation", "Using docker for installation", "Compiling from source code for installation".
+You can choose any of the four methods: "Using pip for installation", "Using docker for installation", "Compiling from source code for installation".
This section will introduce the installation method using pip.
@@ -139,7 +139,6 @@ This section will introduce the installation method using pip.
:hidden:
pip/frompip_en.rst
- conda/fromconda_en.rst
docker/fromdocker_en.rst
compile/fromsource_en.rst
install_NGC_PaddlePaddle_en.rst
diff --git a/docs/install/pip/linux-pip.md b/docs/install/pip/linux-pip.md
index 2b4d9418874..d7b7bab3f44 100644
--- a/docs/install/pip/linux-pip.md
+++ b/docs/install/pip/linux-pip.md
@@ -69,7 +69,7 @@
```
- python3 -m pip install paddlepaddle==3.1.0 -i https://www.paddlepaddle.org.cn/packages/stable/cpu/
+ python3 -m pip install paddlepaddle==3.1.1 -i https://www.paddlepaddle.org.cn/packages/stable/cpu/
```
@@ -80,7 +80,7 @@
```
- python3 -m pip install paddlepaddle-gpu==3.1.0 -i https://www.paddlepaddle.org.cn/packages/stable/cu118/
+ python3 -m pip install paddlepaddle-gpu==3.1.1 -i https://www.paddlepaddle.org.cn/packages/stable/cu118/
```
@@ -88,14 +88,14 @@
```
- python3 -m pip install paddlepaddle-gpu==3.1.0 -i https://www.paddlepaddle.org.cn/packages/stable/cu126/
+ python3 -m pip install paddlepaddle-gpu==3.1.1 -i https://www.paddlepaddle.org.cn/packages/stable/cu126/
```
2.2.3 CUDA12.9 的 PaddlePaddle(如果需要使用 TensorRT 可自行安装 TensorRT10.5.0.18)
```
- python3 -m pip install paddlepaddle-gpu==3.1.0 -i https://www.paddlepaddle.org.cn/packages/stable/cu129/
+ python3 -m pip install paddlepaddle-gpu==3.1.1 -i https://www.paddlepaddle.org.cn/packages/stable/cu129/
```
diff --git a/docs/install/pip/linux-pip_en.md b/docs/install/pip/linux-pip_en.md
index 36eafe1c76b..774a8048c4b 100644
--- a/docs/install/pip/linux-pip_en.md
+++ b/docs/install/pip/linux-pip_en.md
@@ -80,7 +80,7 @@ You can choose the following version of PaddlePaddle to start installation:
```
- python3 -m pip install paddlepaddle==3.1.0 -i https://www.paddlepaddle.org.cn/packages/stable/cpu/
+ python3 -m pip install paddlepaddle==3.1.1 -i https://www.paddlepaddle.org.cn/packages/stable/cpu/
```
@@ -91,20 +91,20 @@ You can choose the following version of PaddlePaddle to start installation:
```
- python3 -m pip install paddlepaddle-gpu==3.1.0 -i https://www.paddlepaddle.org.cn/packages/stable/cu118/
+ python3 -m pip install paddlepaddle-gpu==3.1.1 -i https://www.paddlepaddle.org.cn/packages/stable/cu118/
```
2.2.2 If you are using CUDA 12.6(If you need to use TensorRT, you can install TensorRT 10.5.0.18 yourself)
```
- python3 -m pip install paddlepaddle-gpu==3.1.0 -i https://www.paddlepaddle.org.cn/packages/stable/cu126/
+ python3 -m pip install paddlepaddle-gpu==3.1.1 -i https://www.paddlepaddle.org.cn/packages/stable/cu126/
```
2.2.3 If you are using CUDA 12.9(If you need to use TensorRT, you can install TensorRT 10.5.0.18 yourself)
```
- python3 -m pip install paddlepaddle-gpu==3.1.0 -i https://www.paddlepaddle.org.cn/packages/stable/cu129/
+ python3 -m pip install paddlepaddle-gpu==3.1.1 -i https://www.paddlepaddle.org.cn/packages/stable/cu129/
```
diff --git a/docs/install/pip/macos-pip.md b/docs/install/pip/macos-pip.md
index 7d07bf871ac..b9953b9f581 100644
--- a/docs/install/pip/macos-pip.md
+++ b/docs/install/pip/macos-pip.md
@@ -61,7 +61,7 @@
```
- python3 -m pip install paddlepaddle==3.1.0 -i https://www.paddlepaddle.org.cn/packages/stable/cpu/
+ python3 -m pip install paddlepaddle==3.1.1 -i https://www.paddlepaddle.org.cn/packages/stable/cpu/
```
diff --git a/docs/install/pip/macos-pip_en.md b/docs/install/pip/macos-pip_en.md
index 06f2b986db1..248428a166f 100644
--- a/docs/install/pip/macos-pip_en.md
+++ b/docs/install/pip/macos-pip_en.md
@@ -61,7 +61,7 @@ You can choose the following version of PaddlePaddle to start installation:
```
-python3 -m pip install paddlepaddle==3.1.0 -i https://www.paddlepaddle.org.cn/packages/stable/cpu/
+python3 -m pip install paddlepaddle==3.1.1 -i https://www.paddlepaddle.org.cn/packages/stable/cpu/
```
Note:
diff --git a/docs/install/pip/windows-pip.md b/docs/install/pip/windows-pip.md
index e5a1d74a4d0..76499c5abdd 100644
--- a/docs/install/pip/windows-pip.md
+++ b/docs/install/pip/windows-pip.md
@@ -57,7 +57,7 @@
```
- python -m pip install paddlepaddle==3.1.0 -i https://www.paddlepaddle.org.cn/packages/stable/cpu/
+ python -m pip install paddlepaddle==3.1.1 -i https://www.paddlepaddle.org.cn/packages/stable/cpu/
```
@@ -67,18 +67,18 @@
2.2.1 CUDA11.8 的 PaddlePaddle(如果需要使用 TensorRT 可自行安装 TensorRT8.5.1.7)
```
- python -m pip install paddlepaddle-gpu==3.1.0 -i https://www.paddlepaddle.org.cn/packages/stable/cu118/
+ python -m pip install paddlepaddle-gpu==3.1.1 -i https://www.paddlepaddle.org.cn/packages/stable/cu118/
```
2.2.2 CUDA12.6 的 PaddlePaddle(如果需要使用 TensorRT 可自行安装 TensorRT10.5.0.18)
```
- python -m pip install paddlepaddle-gpu==3.1.0 -i https://www.paddlepaddle.org.cn/packages/stable/cu126/
+ python -m pip install paddlepaddle-gpu==3.1.1 -i https://www.paddlepaddle.org.cn/packages/stable/cu126/
```
2.2.3 CUDA12.9 的 PaddlePaddle(如果需要使用 TensorRT 可自行安装 TensorRT10.5.0.18)
```
- python -m pip install paddlepaddle-gpu==3.1.0 -i https://www.paddlepaddle.org.cn/packages/stable/cu129/
+ python -m pip install paddlepaddle-gpu==3.1.1 -i https://www.paddlepaddle.org.cn/packages/stable/cu129/
```
注:
diff --git a/docs/install/pip/windows-pip_en.md b/docs/install/pip/windows-pip_en.md
index 8ca4d6348c2..eb405a5acd0 100644
--- a/docs/install/pip/windows-pip_en.md
+++ b/docs/install/pip/windows-pip_en.md
@@ -53,7 +53,7 @@ You can choose the following version of PaddlePaddle to start installation:
```
- python -m pip install paddlepaddle==3.1.0 -i https://www.paddlepaddle.org.cn/packages/stable/cpu/
+ python -m pip install paddlepaddle==3.1.1 -i https://www.paddlepaddle.org.cn/packages/stable/cpu/
```
@@ -63,19 +63,19 @@ You can choose the following version of PaddlePaddle to start installation:
2.2.1 If you are using CUDA 11.8(If you need to use TensorRT, you can install TensorRT 8.5.1.7 yourself)
```
- python -m pip install paddlepaddle-gpu==3.1.0 -i https://www.paddlepaddle.org.cn/packages/stable/cu118/
+ python -m pip install paddlepaddle-gpu==3.1.1 -i https://www.paddlepaddle.org.cn/packages/stable/cu118/
```
2.2.2 If you are using CUDA 12.6(If you need to use TensorRT, you can install TensorRT 10.5.0.18 yourself)
```
- python -m pip install paddlepaddle-gpu==3.1.0 -i https://www.paddlepaddle.org.cn/packages/stable/cu126/
+ python -m pip install paddlepaddle-gpu==3.1.1 -i https://www.paddlepaddle.org.cn/packages/stable/cu126/
```
2.2.3 If you are using CUDA 12.9(If you need to use TensorRT, you can install TensorRT 10.5.0.18 yourself)
```
- python -m pip install paddlepaddle-gpu==3.1.0 -i https://www.paddlepaddle.org.cn/packages/stable/cu129/
+ python -m pip install paddlepaddle-gpu==3.1.1 -i https://www.paddlepaddle.org.cn/packages/stable/cu129/
```
Note:
diff --git a/docs/release_note_cn.md b/docs/release_note_cn.md
index e39d3f2e44c..3cd2990ad98 100644
--- a/docs/release_note_cn.md
+++ b/docs/release_note_cn.md
@@ -1,541 +1,298 @@
-# 3.0 Release Note
+# 3.1 Release Note
-作为中国首个自主研发的产业级深度学习平台,飞桨一直坚持开源路线,支撑产业智能化升级。飞桨框架 3.0 版本不仅延续了飞桨框架 2.0 系列动静统一、训推一体的特性,更在自动并行、神经网络编译器、高阶自动微分等方面取得突破,为大模型时代的技术创新与产业应用提供了强大支撑,为开发者打造了一站式、高性能的深度学习开发体验。无论是前沿算法研究还是产业级大模型落地,飞桨框架 3.0 都将成为开发者的首选利器。重点特性说明如下:
+飞桨框架 3.1 版本,针对核心功能自动并行进一步优化打磨,提升易用性和性能表现;同时提供 FP8 低精度训练支持,提升大模型训练速度提升 10-20%;完善硬件扩展机制,降低类 cuda 类硬件适配成本,用户仅需注册 kernel;同时对于框架基础能力进行增强,提升框架稳定性。重点更新功能如下:
-- **动静统一自动并行:** 这一功能大幅度降低了产业开发和训练的成本。用户只需在单卡基础上进行少量的张量切分标记,飞桨框架便会自动完成分布式切分信息的推导,并添加通信算子以确保逻辑的正确性。同时,根据模型结构和集群信息,结合显存和调度层的优化,飞桨能自动寻找最高效的分布式并行策略,从而大幅降低混合并行训练的开发成本,使开发者能够更专注于模型和算法的创新。自动并行架构进行了深入的验证和打磨,以更好地支持纯文稠密模型、纯文稀疏模型(MoE)和多模态理解模型等常见大模型场景的预训练+精调流程;完善算子的切分推导规则,并支持将自动并行训练参数转化成手动并行参数进行下游推理,自动并行达到了全面可用的状态,帮助用户降低大模型并行程序的开发成本。同时,为了进一步简化用户的分布式开发流程,推出全新的`paddle.distributed.parallel`接口,基于对分布式张量标记语法的封装,支持用户在模型组网外不侵入地配置数据并行、模型并行、流水并行等常见的并行策略。此外,静态图自动并行架构基于 PIR 完成了全面的升级,底层的基础组件、核心模块、并行策略和性能优化策略均统一基于扩展的 PIR `DistDialect`进行实现,进一步增强了自动并行的动静一致性,并在 Llama 系列模型上性能达到了持平甚至领先手动并行方式的水平。
-- **大模型训推一体:** 自 2.0 版本起,飞桨便采用了“动静统一、训推一体”的设计理念,3.0 版本也将继续秉持这一理念。得益于动静统一的架构和接口设计,飞桨能够完整支持动态图和静态图这两种不同的运行模式,并且具备出色的整图导出能力。飞桨的动转静整图导出成功率高达 95%,高于 PyTorch 的 62%。“训推一体”意味着能够在同一套框架下,尽可能复用训练和推理的代码,特别是复用模型组网代码。在完成模型的开发训练后,只需进行少量的开发工作,即可实现快速推理部署。这一特性为产业提供了极致的开发体验。它使训练和推理的能力能够相互复用,为大模型的全流程提供了统一的开发体验和极致的训练效率。通过动转静的工作,训练和推理的工作得以无缝衔接。支持多款主流大模型、DeepSeek-R1 满血版实现单机部署,吞吐提升一倍。
-- **科学计算高阶微分:** 飞桨框架 3.0 为科学计算提供了高阶自动微分、编译优化和分布式训练能力的支撑。英伟达 Modulus 的 41 个不同方程实验显示,飞桨的微分方程求解速度比 PyTorch 开启编译器优化后的版本平均快 115%。同时,飞桨还建设了面向通用数理问题求解的赛桨 PaddleScience 以及专注于生物计算的螺旋桨 PaddleHelix 工具包。此外,飞桨框架 3.0 还原生支持复数技术体系,这对于气象预报、汽车飞行器气动分析等场景下的数据特征分析具有重要意义。
-- **神经网络编译器:** 这一功能显著降低了性能优化的成本。飞桨的编译器采用与框架一体化的设计,能够支持生成式模型、科学计算模型等多种模型的高效训练与可变形状推理,在计算灵活性与高性能之间提供了良好的平衡点。使用 CINN 编译器后超过 60%的 模型有显著性能提升,平均提升达 27.4%。CINN 神经网络编译器在完备性、性能表现等方面效果全面提升。此版本中,我们对编译器前端、后端各个环节进行了全面优化:包括新增反向计算图自动 Re-Compute 机制、前端 Pass 性能优化、符号推导机制升级、算子融合策略优化、后端 Schedule 策略和下标表达式化简能力增强等,同时排查并修复了大量正确性和性能问题,系统化的提升了编译器的通用优化能力。
-- **异构多芯适配:** 飞桨的重要特色之一是适配异构多芯并充分释放硬件潜能。在接入机制上,飞桨提供了简洁高效的抽象接口和基础算子体系,降低了适配成本。在运行机制上,它优化了调度编排和存储共享等机制,提升了调度效率。从算子内核角度,飞桨提供了编译器自动融合调优方案,以提升端到端的性能。同时,飞桨还为新硬件厂商建设了代码合入、持续集成、模型回归测试等研发基础设施。这些机制保障了新硬件被纳入飞桨的正常发版体系中,用户无需编译即可直接安装试用。飞桨这种功能完善、低成本接入的机制吸引了硬件厂商共同为飞桨贡献了 4001 个 PR,共包含 26584 个 commits。
-
-除了上述核心特性外,**高扩展中间表示**为了提升飞桨框架的可扩展性,我们研发了高扩展中间表示 PIR(Paddle Intermediate Representation)。这一表示系统性地抽象了底层核心概念,提供了灵活且高效的组件。PIR 作为基础设施,支撑着动转静、自动微分、自动并行、组合算子、图优化等多项技术,并广泛应用于分布式训练、模型压缩、推理部署等场景。通过 PIR 提供的 DRR(Declarative Rewrite Rule)机制,Pass 的开发成本可以降低 60%。同时 PIR 完成在全场景的验证,并默认开启,支持一键动转静,保证了框架卓越的性能表现和良好的拓展性。对框架 2.0 版已有功能的持续改进,同时新特性在使用体验、性能、二次开发便利度以及硬件适配能力等方面带来了显著提升。此版本在用户体验层面持续丰富并增强了满足更多场景的 API 功能,针对大模型场景优化完善了分布式并行策略优化和推理功能增强,在编译安装方面做了比较彻底的易用性改进,对依赖包的安装方式和版本进行了全新同步升级,对系统安全进行了全面加固,对产品文档也进行了全面的纠错检查,同时也对一些废弃代码做了大量的清理以保证架构的简洁性。
-
-## 不兼容升级
-
-飞桨 API 支持隐式类型提升。在加减乘除等最常用的计算中,如果两个输入的数据类型不一样,就需要确定输出的数据类型问题。飞桨历史上的现状是部分支持且实际规则并不清楚,客观上表现为动静不一致、API 和运算符重载不一致 及 不符合交换率,特别是在大模型广泛使用 bf16/fp16 与 fp32 进行混合计算时容易出现非预期问题且难以定位。飞桨从 3.0 beta 版本开始,明确了[隐式数据类型提升规则](https://www.paddlepaddle.org.cn/documentation/docs/zh/develop/guides/advanced/auto_type_promotion_cn.html),其中详细定义了 Tensor 与 Tensor 和 Tensor 与 1 个数(Scalar)计算结果的类型,保证了计算符合交换律,运算符重载与二元 API 结果一致,动态图与静态图结果一致。更符合用户理解和业界习惯。https://github.com/PaddlePaddle/Paddle/pull/60638, https://github.com/PaddlePaddle/Paddle/pull/63842, https://github.com/PaddlePaddle/Paddle/pull/60011
-
-## 废弃功能
-
-支持 0 维 Tensor 已经稳定了 2 个版本,本版本取消了在一些情况下将 0 维 Tensor 转成只含 1 个元素的 1 维 Tensor 的开关 FLAGS_set_to_1d,这个开关是为了兼容一些套件中用 1 个元素的 1 维 Tensor 表示 0 维 Tensor 的不正确写法。即当前飞桨完全区分 0 维 Tensor 和只含 1 个元素的 1 维 Tensor 的语义,两者不等价。https://github.com/PaddlePaddle/Paddle/pull/61227
+- **自动并行架构:** 自动并行架构进一步打磨,以提高自动并行核心机制易用性和动态图性能。完善了自动并行核心机制,包括新增了多个算子的切分推导规则,支持分布式张量的同一维度被多个 mesh 维度切分,支持动态图并行策略(PP,CP,SEP,TP-CONV)等。同时,对动态图自动并行系统地做了性能优化,在 Llama2 Qwen Baichuan 等系列模型上性能基本持平手动并行的性能。
+- **低精度训练:** 基于 blockwise 的 fp8 gemm 算子,支持低精度训练,训练精度媲美 BF16,大模型训练速度提速 10-20%。
+- **异构多芯适配:** 提供类 cuda 算子复用机制,仅需注册即可使用对应 kernel。
+- **框架稳定性增强:** 系统修复算子在 0-Size 和大维度情况计算结果错误。
## 1. 用户体验升级
-### 新特性
+API 功能增强、Bug 修复与改进,旨在提升用户体验和 API 的易用性。新增了`paddle.randn_like` API,修复了多个 API 的功能缺陷,并增强了对复数类型和 0-Size Tensor 的支持。文档和代码也进行了相应的更新和优化,以提升整体的准确性和专业性。
-- 新增飞桨 API,扩展飞桨功能。包括 `paddle.nn.FeatureAlphaDropout`, `paddle.cartesian_prod`, `paddle.distributed.to_distributed`, `paddle.pi` 等。[#64881](https://github.com/PaddlePaddle/Paddle/pull/64881), [#65605](https://github.com/PaddlePaddle/Paddle/pull/65605), [#70757](https://github.com/PaddlePaddle/Paddle/pull/70757), [#71030](https://github.com/PaddlePaddle/Paddle/pull/71030), [#69946](https://github.com/PaddlePaddle/Paddle/pull/69946), [#70021](https://github.com/PaddlePaddle/Paddle/pull/70021), [#69613](https://github.com/PaddlePaddle/Paddle/pull/69613), [#68123](https://github.com/PaddlePaddle/Paddle/pull/68123), [#70032](https://github.com/PaddlePaddle/Paddle/pull/70032)
-- 新增 Tensor 类方法和属性,及新增相关单测,使得 Tensor 更易用。[#68334](https://github.com/PaddlePaddle/Paddle/pull/68334), [#68681](https://github.com/PaddlePaddle/Paddle/pull/68681), [#69132](https://github.com/PaddlePaddle/Paddle/pull/69132), [#69270](https://github.com/PaddlePaddle/Paddle/pull/69270), [#69256](https://github.com/PaddlePaddle/Paddle/pull/69256), [#69197](https://github.com/PaddlePaddle/Paddle/pull/69197), [#69231](https://github.com/PaddlePaddle/Paddle/pull/69231), [#69222](https://github.com/PaddlePaddle/Paddle/pull/69222), [#69257](https://github.com/PaddlePaddle/Paddle/pull/69257), [#69301](https://github.com/PaddlePaddle/Paddle/pull/69301), [#69361](https://github.com/PaddlePaddle/Paddle/pull/69361), [#69348](https://github.com/PaddlePaddle/Paddle/pull/69348), [#69464](https://github.com/PaddlePaddle/Paddle/pull/69464), [#69542](https://github.com/PaddlePaddle/Paddle/pull/69542), [#69667](https://github.com/PaddlePaddle/Paddle/pull/69667), [#69563](https://github.com/PaddlePaddle/Paddle/pull/69563), [#69796](https://github.com/PaddlePaddle/Paddle/pull/69796), [#69477](https://github.com/PaddlePaddle/Paddle/pull/69477), [#69779](https://github.com/PaddlePaddle/Paddle/pull/69779), [#69724](https://github.com/PaddlePaddle/Paddle/pull/69724), [#69835](https://github.com/PaddlePaddle/Paddle/pull/69835), [#69781](https://github.com/PaddlePaddle/Paddle/pull/69781), [#69982](https://github.com/PaddlePaddle/Paddle/pull/69982), [#69913](https://github.com/PaddlePaddle/Paddle/pull/69913), [#70026](https://github.com/PaddlePaddle/Paddle/pull/70026), [#70013](https://github.com/PaddlePaddle/Paddle/pull/70013), [#69539](https://github.com/PaddlePaddle/Paddle/pull/69539), [#69736](https://github.com/PaddlePaddle/Paddle/pull/69736), [#69841](https://github.com/PaddlePaddle/Paddle/pull/69841), [#70277](https://github.com/PaddlePaddle/Paddle/pull/70277), [#69580](https://github.com/PaddlePaddle/Paddle/pull/69580), [#69599](https://github.com/PaddlePaddle/Paddle/pull/69599), [#69693](https://github.com/PaddlePaddle/Paddle/pull/69693), [#69848](https://github.com/PaddlePaddle/Paddle/pull/69848), [#69751](https://github.com/PaddlePaddle/Paddle/pull/69751), [#70556](https://github.com/PaddlePaddle/Paddle/pull/70556), [#70591](https://github.com/PaddlePaddle/Paddle/pull/70591), [#69673](https://github.com/PaddlePaddle/Paddle/pull/69673), [#70647](https://github.com/PaddlePaddle/Paddle/pull/70647), [#68192](https://github.com/PaddlePaddle/Paddle/pull/68192), [#68511](https://github.com/PaddlePaddle/Paddle/pull/68511), [#68833](https://github.com/PaddlePaddle/Paddle/pull/68833), [#69406](https://github.com/PaddlePaddle/Paddle/pull/69406), [#69480](https://github.com/PaddlePaddle/Paddle/pull/69480), [#69463](https://github.com/PaddlePaddle/Paddle/pull/69463), [#69632](https://github.com/PaddlePaddle/Paddle/pull/69632), [#69473](https://github.com/PaddlePaddle/Paddle/pull/69473), [#68694](https://github.com/PaddlePaddle/Paddle/pull/68694), [#69534](https://github.com/PaddlePaddle/Paddle/pull/69534), [#69820](https://github.com/PaddlePaddle/Paddle/pull/69820), [#70121](https://github.com/PaddlePaddle/Paddle/pull/70121)
-
-### API 功能增强
+### 新特性
-- 增强了 43 个 API 的功能,使得已有 API 更易用,也更容易进行代码转换。包括但不限于增加 API 参数,扩展 API 支持的数据类型,以及修正原有不合理设计等。[#65105](https://github.com/PaddlePaddle/Paddle/pull/65105), [#65103](https://github.com/PaddlePaddle/Paddle/pull/65103), [#62975](https://github.com/PaddlePaddle/Paddle/pull/62975), [#64436](https://github.com/PaddlePaddle/Paddle/pull/64436), [#63346](https://github.com/PaddlePaddle/Paddle/pull/63346), [#68079](https://github.com/PaddlePaddle/Paddle/pull/68079), [#67878](https://github.com/PaddlePaddle/Paddle/pull/67878), [#68432](https://github.com/PaddlePaddle/Paddle/pull/68432), [#68677](https://github.com/PaddlePaddle/Paddle/pull/68677), [#69012](https://github.com/PaddlePaddle/Paddle/pull/69012), [#69385](https://github.com/PaddlePaddle/Paddle/pull/69385), [#65032](https://github.com/PaddlePaddle/Paddle/pull/65032), [#64977](https://github.com/PaddlePaddle/Paddle/pull/64977), [#67071](https://github.com/PaddlePaddle/Paddle/pull/67071), [#67298](https://github.com/PaddlePaddle/Paddle/pull/67298), [#66687](https://github.com/PaddlePaddle/Paddle/pull/66687), [#65946](https://github.com/PaddlePaddle/Paddle/pull/65946), [#66170](https://github.com/PaddlePaddle/Paddle/pull/66170), [#66929](https://github.com/PaddlePaddle/Paddle/pull/66929), [#67994](https://github.com/PaddlePaddle/Paddle/pull/67994), [#67947](https://github.com/PaddlePaddle/Paddle/pull/67947), [#68033](https://github.com/PaddlePaddle/Paddle/pull/68033), [#68046](https://github.com/PaddlePaddle/Paddle/pull/68046), [#68294](https://github.com/PaddlePaddle/Paddle/pull/68294), [#68214](https://github.com/PaddlePaddle/Paddle/pull/68214), [#68281](https://github.com/PaddlePaddle/Paddle/pull/68281), [#68390](https://github.com/PaddlePaddle/Paddle/pull/68390), [#68772](https://github.com/PaddlePaddle/Paddle/pull/68772), [#69451](https://github.com/PaddlePaddle/Paddle/pull/69451), [#69252](https://github.com/PaddlePaddle/Paddle/pull/69252), [#69529](https://github.com/PaddlePaddle/Paddle/pull/69529), [#69750](https://github.com/PaddlePaddle/Paddle/pull/69750), [#69827](https://github.com/PaddlePaddle/Paddle/pull/69827), [#69099](https://github.com/PaddlePaddle/Paddle/pull/69099), [#68594](https://github.com/PaddlePaddle/Paddle/pull/68594), [#70090](https://github.com/PaddlePaddle/Paddle/pull/70090), [#70228](https://github.com/PaddlePaddle/Paddle/pull/70228), [#70166](https://github.com/PaddlePaddle/Paddle/pull/70166), [#70389](https://github.com/PaddlePaddle/Paddle/pull/70389), [#70790](https://github.com/PaddlePaddle/Paddle/pull/70790), [#71029](https://github.com/PaddlePaddle/Paddle/pull/71029), [#71283](https://github.com/PaddlePaddle/Paddle/pull/71283), [#71342](https://github.com/PaddlePaddle/Paddle/pull/71342)
-- 飞桨 Python API 全面支持类型提示。所有 Python API 的参数和返回值都添加了类型提示,以便于开发和使用。[#65209](https://github.com/PaddlePaddle/Paddle/pull/65209), [#65201](https://github.com/PaddlePaddle/Paddle/pull/65201), [#65190](https://github.com/PaddlePaddle/Paddle/pull/65190), [#65082](https://github.com/PaddlePaddle/Paddle/pull/65082), [#65226](https://github.com/PaddlePaddle/Paddle/pull/65226), [#65076](https://github.com/PaddlePaddle/Paddle/pull/65076), [#65238](https://github.com/PaddlePaddle/Paddle/pull/65238), [#65236](https://github.com/PaddlePaddle/Paddle/pull/65236), [#65247](https://github.com/PaddlePaddle/Paddle/pull/65247), [#65249](https://github.com/PaddlePaddle/Paddle/pull/65249), [#65244](https://github.com/PaddlePaddle/Paddle/pull/65244), [#65272](https://github.com/PaddlePaddle/Paddle/pull/65272), [#65191](https://github.com/PaddlePaddle/Paddle/pull/65191), [#65290](https://github.com/PaddlePaddle/Paddle/pull/65290), [#65255](https://github.com/PaddlePaddle/Paddle/pull/65255), [#65292](https://github.com/PaddlePaddle/Paddle/pull/65292), [#65300](https://github.com/PaddlePaddle/Paddle/pull/65300), [#65301](https://github.com/PaddlePaddle/Paddle/pull/65301), [#65332](https://github.com/PaddlePaddle/Paddle/pull/65332), [#65323](https://github.com/PaddlePaddle/Paddle/pull/65323), [#65326](https://github.com/PaddlePaddle/Paddle/pull/65326), [#65273](https://github.com/PaddlePaddle/Paddle/pull/65273), [#65317](https://github.com/PaddlePaddle/Paddle/pull/65317), [#65354](https://github.com/PaddlePaddle/Paddle/pull/65354), [#65283](https://github.com/PaddlePaddle/Paddle/pull/65283), [#65372](https://github.com/PaddlePaddle/Paddle/pull/65372), [#65337](https://github.com/PaddlePaddle/Paddle/pull/65337), [#65085](https://github.com/PaddlePaddle/Paddle/pull/65085), [#65382](https://github.com/PaddlePaddle/Paddle/pull/65382), [#65381](https://github.com/PaddlePaddle/Paddle/pull/65381), [#65378](https://github.com/PaddlePaddle/Paddle/pull/65378), [#65274](https://github.com/PaddlePaddle/Paddle/pull/65274), [#65380](https://github.com/PaddlePaddle/Paddle/pull/65380), [#65386](https://github.com/PaddlePaddle/Paddle/pull/65386), [#65351](https://github.com/PaddlePaddle/Paddle/pull/65351), [#65284](https://github.com/PaddlePaddle/Paddle/pull/65284), [#65366](https://github.com/PaddlePaddle/Paddle/pull/65366), [#65308](https://github.com/PaddlePaddle/Paddle/pull/65308), [#65375](https://github.com/PaddlePaddle/Paddle/pull/65375), [#65376](https://github.com/PaddlePaddle/Paddle/pull/65376), [#65464](https://github.com/PaddlePaddle/Paddle/pull/65464), [#65197](https://github.com/PaddlePaddle/Paddle/pull/65197), [#65455](https://github.com/PaddlePaddle/Paddle/pull/65455), [#65457](https://github.com/PaddlePaddle/Paddle/pull/65457), [#65487](https://github.com/PaddlePaddle/Paddle/pull/65487), [#65486](https://github.com/PaddlePaddle/Paddle/pull/65486), [#65547](https://github.com/PaddlePaddle/Paddle/pull/65547), [#65504](https://github.com/PaddlePaddle/Paddle/pull/65504), [#65460](https://github.com/PaddlePaddle/Paddle/pull/65460), [#65183](https://github.com/PaddlePaddle/Paddle/pull/65183), [#65454](https://github.com/PaddlePaddle/Paddle/pull/65454), [#65559](https://github.com/PaddlePaddle/Paddle/pull/65559), [#65560](https://github.com/PaddlePaddle/Paddle/pull/65560), [#65570](https://github.com/PaddlePaddle/Paddle/pull/65570), [#65569](https://github.com/PaddlePaddle/Paddle/pull/65569), [#65566](https://github.com/PaddlePaddle/Paddle/pull/65566), [#65620](https://github.com/PaddlePaddle/Paddle/pull/65620), [#65568](https://github.com/PaddlePaddle/Paddle/pull/65568), [#65567](https://github.com/PaddlePaddle/Paddle/pull/65567), [#65660](https://github.com/PaddlePaddle/Paddle/pull/65660), [#65645](https://github.com/PaddlePaddle/Paddle/pull/65645), [#65600](https://github.com/PaddlePaddle/Paddle/pull/65600), [#65532](https://github.com/PaddlePaddle/Paddle/pull/65532), [#65765](https://github.com/PaddlePaddle/Paddle/pull/65765), [#65767](https://github.com/PaddlePaddle/Paddle/pull/65767), [#65770](https://github.com/PaddlePaddle/Paddle/pull/65770), [#65768](https://github.com/PaddlePaddle/Paddle/pull/65768), [#65771](https://github.com/PaddlePaddle/Paddle/pull/65771), [#65772](https://github.com/PaddlePaddle/Paddle/pull/65772), [#65774](https://github.com/PaddlePaddle/Paddle/pull/65774), [#65769](https://github.com/PaddlePaddle/Paddle/pull/65769), [#65773](https://github.com/PaddlePaddle/Paddle/pull/65773), [#65766](https://github.com/PaddlePaddle/Paddle/pull/65766), [#65776](https://github.com/PaddlePaddle/Paddle/pull/65776), [#65775](https://github.com/PaddlePaddle/Paddle/pull/65775), [#65755](https://github.com/PaddlePaddle/Paddle/pull/65755), [#65779](https://github.com/PaddlePaddle/Paddle/pull/65779), [#65777](https://github.com/PaddlePaddle/Paddle/pull/65777), [#65823](https://github.com/PaddlePaddle/Paddle/pull/65823), [#65807](https://github.com/PaddlePaddle/Paddle/pull/65807), [#65821](https://github.com/PaddlePaddle/Paddle/pull/65821), [#65819](https://github.com/PaddlePaddle/Paddle/pull/65819), [#65810](https://github.com/PaddlePaddle/Paddle/pull/65810), [#65808](https://github.com/PaddlePaddle/Paddle/pull/65808), [#65824](https://github.com/PaddlePaddle/Paddle/pull/65824), [#65553](https://github.com/PaddlePaddle/Paddle/pull/65553), [#65818](https://github.com/PaddlePaddle/Paddle/pull/65818), [#65812](https://github.com/PaddlePaddle/Paddle/pull/65812), [#65803](https://github.com/PaddlePaddle/Paddle/pull/65803), [#65865](https://github.com/PaddlePaddle/Paddle/pull/65865), [#65870](https://github.com/PaddlePaddle/Paddle/pull/65870), [#65866](https://github.com/PaddlePaddle/Paddle/pull/65866), [#65844](https://github.com/PaddlePaddle/Paddle/pull/65844), [#65845](https://github.com/PaddlePaddle/Paddle/pull/65845), [#65853](https://github.com/PaddlePaddle/Paddle/pull/65853), [#65874](https://github.com/PaddlePaddle/Paddle/pull/65874), [#65871](https://github.com/PaddlePaddle/Paddle/pull/65871), [#65809](https://github.com/PaddlePaddle/Paddle/pull/65809), [#65867](https://github.com/PaddlePaddle/Paddle/pull/65867), [#65822](https://github.com/PaddlePaddle/Paddle/pull/65822), [#65872](https://github.com/PaddlePaddle/Paddle/pull/65872), [#65873](https://github.com/PaddlePaddle/Paddle/pull/65873), [#65869](https://github.com/PaddlePaddle/Paddle/pull/65869), [#65868](https://github.com/PaddlePaddle/Paddle/pull/65868), [#65849](https://github.com/PaddlePaddle/Paddle/pull/65849), [#65875](https://github.com/PaddlePaddle/Paddle/pull/65875), [#65876](https://github.com/PaddlePaddle/Paddle/pull/65876), [#65843](https://github.com/PaddlePaddle/Paddle/pull/65843), [#65727](https://github.com/PaddlePaddle/Paddle/pull/65727), [#65587](https://github.com/PaddlePaddle/Paddle/pull/65587), [#66006](https://github.com/PaddlePaddle/Paddle/pull/66006), [#66005](https://github.com/PaddlePaddle/Paddle/pull/66005), [#65785](https://github.com/PaddlePaddle/Paddle/pull/65785), [#65784](https://github.com/PaddlePaddle/Paddle/pull/65784), [#65811](https://github.com/PaddlePaddle/Paddle/pull/65811), [#65919](https://github.com/PaddlePaddle/Paddle/pull/65919), [#65838](https://github.com/PaddlePaddle/Paddle/pull/65838), [#65852](https://github.com/PaddlePaddle/Paddle/pull/65852), [#65847](https://github.com/PaddlePaddle/Paddle/pull/65847), [#66014](https://github.com/PaddlePaddle/Paddle/pull/66014), [#65805](https://github.com/PaddlePaddle/Paddle/pull/65805), [#66009](https://github.com/PaddlePaddle/Paddle/pull/66009), [#66012](https://github.com/PaddlePaddle/Paddle/pull/66012), [#65633](https://github.com/PaddlePaddle/Paddle/pull/65633), [#66011](https://github.com/PaddlePaddle/Paddle/pull/66011), [#66010](https://github.com/PaddlePaddle/Paddle/pull/66010), [#66013](https://github.com/PaddlePaddle/Paddle/pull/66013), [#66015](https://github.com/PaddlePaddle/Paddle/pull/66015), [#66016](https://github.com/PaddlePaddle/Paddle/pull/66016), [#66030](https://github.com/PaddlePaddle/Paddle/pull/66030), [#66028](https://github.com/PaddlePaddle/Paddle/pull/66028), [#66029](https://github.com/PaddlePaddle/Paddle/pull/66029), [#66054](https://github.com/PaddlePaddle/Paddle/pull/66054), [#66040](https://github.com/PaddlePaddle/Paddle/pull/66040), [#65993](https://github.com/PaddlePaddle/Paddle/pull/65993), [#66058](https://github.com/PaddlePaddle/Paddle/pull/66058), [#66280](https://github.com/PaddlePaddle/Paddle/pull/66280), [#66037](https://github.com/PaddlePaddle/Paddle/pull/66037), [#66057](https://github.com/PaddlePaddle/Paddle/pull/66057), [#66077](https://github.com/PaddlePaddle/Paddle/pull/66077), [#66051](https://github.com/PaddlePaddle/Paddle/pull/66051), [#65912](https://github.com/PaddlePaddle/Paddle/pull/65912), [#66090](https://github.com/PaddlePaddle/Paddle/pull/66090), [#66189](https://github.com/PaddlePaddle/Paddle/pull/66189), [#66127](https://github.com/PaddlePaddle/Paddle/pull/66127), [#66277](https://github.com/PaddlePaddle/Paddle/pull/66277), [#66119](https://github.com/PaddlePaddle/Paddle/pull/66119), [#66270](https://github.com/PaddlePaddle/Paddle/pull/66270), [#66305](https://github.com/PaddlePaddle/Paddle/pull/66305), [#66306](https://github.com/PaddlePaddle/Paddle/pull/66306), [#66279](https://github.com/PaddlePaddle/Paddle/pull/66279), [#66276](https://github.com/PaddlePaddle/Paddle/pull/66276), [#66295](https://github.com/PaddlePaddle/Paddle/pull/66295), [#66301](https://github.com/PaddlePaddle/Paddle/pull/66301), [#66473](https://github.com/PaddlePaddle/Paddle/pull/66473), [#66384](https://github.com/PaddlePaddle/Paddle/pull/66384), [#66505](https://github.com/PaddlePaddle/Paddle/pull/66505), [#66328](https://github.com/PaddlePaddle/Paddle/pull/66328), [#66394](https://github.com/PaddlePaddle/Paddle/pull/66394), [#66392](https://github.com/PaddlePaddle/Paddle/pull/66392), [#66432](https://github.com/PaddlePaddle/Paddle/pull/66432), [#66575](https://github.com/PaddlePaddle/Paddle/pull/66575), [#66572](https://github.com/PaddlePaddle/Paddle/pull/66572), [#66656](https://github.com/PaddlePaddle/Paddle/pull/66656), [#66475](https://github.com/PaddlePaddle/Paddle/pull/66475), [#66654](https://github.com/PaddlePaddle/Paddle/pull/66654), [#66616](https://github.com/PaddlePaddle/Paddle/pull/66616), [#66694](https://github.com/PaddlePaddle/Paddle/pull/66694), [#66686](https://github.com/PaddlePaddle/Paddle/pull/66686), [#66766](https://github.com/PaddlePaddle/Paddle/pull/66766), [#66749](https://github.com/PaddlePaddle/Paddle/pull/66749), [#66760](https://github.com/PaddlePaddle/Paddle/pull/66760), [#66803](https://github.com/PaddlePaddle/Paddle/pull/66803), [#66770](https://github.com/PaddlePaddle/Paddle/pull/66770), [#66693](https://github.com/PaddlePaddle/Paddle/pull/66693), [#66771](https://github.com/PaddlePaddle/Paddle/pull/66771), [#66792](https://github.com/PaddlePaddle/Paddle/pull/66792), [#66862](https://github.com/PaddlePaddle/Paddle/pull/66862), [#66867](https://github.com/PaddlePaddle/Paddle/pull/66867), [#66684](https://github.com/PaddlePaddle/Paddle/pull/66684), [#66966](https://github.com/PaddlePaddle/Paddle/pull/66966), [#66793](https://github.com/PaddlePaddle/Paddle/pull/66793), [#66987](https://github.com/PaddlePaddle/Paddle/pull/66987), [#66985](https://github.com/PaddlePaddle/Paddle/pull/66985), [#66989](https://github.com/PaddlePaddle/Paddle/pull/66989), [#66639](https://github.com/PaddlePaddle/Paddle/pull/66639), [#66994](https://github.com/PaddlePaddle/Paddle/pull/66994), [#66986](https://github.com/PaddlePaddle/Paddle/pull/66986), [#66993](https://github.com/PaddlePaddle/Paddle/pull/66993), [#67002](https://github.com/PaddlePaddle/Paddle/pull/67002), [#66996](https://github.com/PaddlePaddle/Paddle/pull/66996), [#67001](https://github.com/PaddlePaddle/Paddle/pull/67001), [#66864](https://github.com/PaddlePaddle/Paddle/pull/66864), [#67031](https://github.com/PaddlePaddle/Paddle/pull/67031), [#67089](https://github.com/PaddlePaddle/Paddle/pull/67089), [#67143](https://github.com/PaddlePaddle/Paddle/pull/67143), [#67179](https://github.com/PaddlePaddle/Paddle/pull/67179), [#67178](https://github.com/PaddlePaddle/Paddle/pull/67178), [#67284](https://github.com/PaddlePaddle/Paddle/pull/67284), [#67104](https://github.com/PaddlePaddle/Paddle/pull/67104), [#67079](https://github.com/PaddlePaddle/Paddle/pull/67079), [#67132](https://github.com/PaddlePaddle/Paddle/pull/67132), [#67147](https://github.com/PaddlePaddle/Paddle/pull/67147), [#67204](https://github.com/PaddlePaddle/Paddle/pull/67204), [#67112](https://github.com/PaddlePaddle/Paddle/pull/67112), [#67233](https://github.com/PaddlePaddle/Paddle/pull/67233), [#67366](https://github.com/PaddlePaddle/Paddle/pull/67366), [#67067](https://github.com/PaddlePaddle/Paddle/pull/67067), [#67391](https://github.com/PaddlePaddle/Paddle/pull/67391), [#67428](https://github.com/PaddlePaddle/Paddle/pull/67428), [#67197](https://github.com/PaddlePaddle/Paddle/pull/67197), [#67047](https://github.com/PaddlePaddle/Paddle/pull/67047), [#66890](https://github.com/PaddlePaddle/Paddle/pull/66890), [#67159](https://github.com/PaddlePaddle/Paddle/pull/67159), [#67439](https://github.com/PaddlePaddle/Paddle/pull/67439), [#67555](https://github.com/PaddlePaddle/Paddle/pull/67555), [#67448](https://github.com/PaddlePaddle/Paddle/pull/67448), [#67556](https://github.com/PaddlePaddle/Paddle/pull/67556), [#67469](https://github.com/PaddlePaddle/Paddle/pull/67469), [#67558](https://github.com/PaddlePaddle/Paddle/pull/67558), [#67405](https://github.com/PaddlePaddle/Paddle/pull/67405), [#67644](https://github.com/PaddlePaddle/Paddle/pull/67644), [#67624](https://github.com/PaddlePaddle/Paddle/pull/67624), [#67679](https://github.com/PaddlePaddle/Paddle/pull/67679), [#67677](https://github.com/PaddlePaddle/Paddle/pull/67677), [#67785](https://github.com/PaddlePaddle/Paddle/pull/67785), [#67767](https://github.com/PaddlePaddle/Paddle/pull/67767), [#65319](https://github.com/PaddlePaddle/Paddle/pull/65319), [#65277](https://github.com/PaddlePaddle/Paddle/pull/65277), [#67673](https://github.com/PaddlePaddle/Paddle/pull/67673), [#65557](https://github.com/PaddlePaddle/Paddle/pull/65557), [#67527](https://github.com/PaddlePaddle/Paddle/pull/67527), [#66965](https://github.com/PaddlePaddle/Paddle/pull/66965), [#65905](https://github.com/PaddlePaddle/Paddle/pull/65905), [#65657](https://github.com/PaddlePaddle/Paddle/pull/65657), [#66357](https://github.com/PaddlePaddle/Paddle/pull/66357), [#68163](https://github.com/PaddlePaddle/Paddle/pull/68163)
-- 优化了较多飞桨 API 的报错信息,使得报错更易懂。[#67148](https://github.com/PaddlePaddle/Paddle/pull/67148), [#67154](https://github.com/PaddlePaddle/Paddle/pull/67154), [#67546](https://github.com/PaddlePaddle/Paddle/pull/67546), [#67335](https://github.com/PaddlePaddle/Paddle/pull/67335), [#67255](https://github.com/PaddlePaddle/Paddle/pull/67255), [#67099](https://github.com/PaddlePaddle/Paddle/pull/67099), [#67074](https://github.com/PaddlePaddle/Paddle/pull/67074), [#67073](https://github.com/PaddlePaddle/Paddle/pull/67073), [#66957](https://github.com/PaddlePaddle/Paddle/pull/66957), [#67063](https://github.com/PaddlePaddle/Paddle/pull/67063), [#67575](https://github.com/PaddlePaddle/Paddle/pull/67575), [#67608](https://github.com/PaddlePaddle/Paddle/pull/67608), [#67634](https://github.com/PaddlePaddle/Paddle/pull/67634), [#67325](https://github.com/PaddlePaddle/Paddle/pull/67325), [#67429](https://github.com/PaddlePaddle/Paddle/pull/67429), [#67401](https://github.com/PaddlePaddle/Paddle/pull/67401), [#66881](https://github.com/PaddlePaddle/Paddle/pull/66881), [#68492](https://github.com/PaddlePaddle/Paddle/pull/68492), [#67695](https://github.com/PaddlePaddle/Paddle/pull/67695), [#69833](https://github.com/PaddlePaddle/Paddle/pull/69833), [#70398](https://github.com/PaddlePaddle/Paddle/pull/70398)
+- 新增`paddle.randn_like` API。[#72492](https://github.com/PaddlePaddle/Paddle/pull/72492)
### Bug 修复
-- 修复 `paddle.nn.functional.max_unpool1d` 中当输入 output_size 为 tuple 时的 bug。 [#65910](https://github.com/PaddlePaddle/Paddle/pull/65910)
-- 修复 `paddle.base.core.eager.Tensor` 中不支持 paddle::DataType 的问题。 [#66765](https://github.com/PaddlePaddle/Paddle/pull/66765)
-- 修复打开 pir 开关时,bf16 训练报错的问题。 [#66833](https://github.com/PaddlePaddle/Paddle/pull/66833)
-- 修复流水线并行中,线性层 bias 的问题。 [#67212](https://github.com/PaddlePaddle/Paddle/pull/67212)
-- 修复流水线并行中,使用 loss 进行判断时的报错问题。 [#66980](https://github.com/PaddlePaddle/Paddle/pull/66980)
-- 修复流水线并行中,使用`paddle.Tensor.item` 的报错问题。 [#67441](https://github.com/PaddlePaddle/Paddle/pull/67441)
-- 修复 `paddle.einsum` 在特定场景的 bug。 [#67588](https://github.com/PaddlePaddle/Paddle/pull/67588)
-- 修复 `paddle.nn.SyncBatchNorm` 在梯度计算时的报错问题。 [#67559](https://github.com/PaddlePaddle/Paddle/pull/67559)
-- 修复 [issue #69992](https://github.com/PaddlePaddle/Paddle/issues/69992) 提到的问题。 [#70017](https://github.com/PaddlePaddle/Paddle/pull/70017)
-- 修复 `paddle.arange` 在遇到大整数时,计算结果错误的问题。 [#70188](https://github.com/PaddlePaddle/Paddle/pull/70188)
-- 修复 `paddle.max`、`paddle.min` 在输入存在 nan 时传播不正确问题。 [#70049](https://github.com/PaddlePaddle/Paddle/pull/70049)
-- 修复 `paddle.linalg.svd`, `paddle.linalg.any` 等 API 在处理 0-size Tensor 时的问题。 [#70235](https://github.com/PaddlePaddle/Paddle/pull/70235), [#70489](https://github.com/PaddlePaddle/Paddle/pull/70489), [#70047](https://github.com/PaddlePaddle/Paddle/pull/70047), [#70103](https://github.com/PaddlePaddle/Paddle/pull/70103), [#70127](https://github.com/PaddlePaddle/Paddle/pull/70127), [#70098](https://github.com/PaddlePaddle/Paddle/pull/70098), [#70077](https://github.com/PaddlePaddle/Paddle/pull/70077), [#70130](https://github.com/PaddlePaddle/Paddle/pull/70130), [#70254](https://github.com/PaddlePaddle/Paddle/pull/70254), [#70125](https://github.com/PaddlePaddle/Paddle/pull/70125), [#70342](https://github.com/PaddlePaddle/Paddle/pull/70342), [#70369](https://github.com/PaddlePaddle/Paddle/pull/70369), [#71094](https://github.com/PaddlePaddle/Paddle/pull/71094), [#71089](https://github.com/PaddlePaddle/Paddle/pull/71089), [#71185](https://github.com/PaddlePaddle/Paddle/pull/71185), [#70537](https://github.com/PaddlePaddle/Paddle/pull/70537), [#70481](https://github.com/PaddlePaddle/Paddle/pull/70481)
-- 修复一些类型提示标注的问题、文档问题等。[#65429](https://github.com/PaddlePaddle/Paddle/pull/65429), [#65496](https://github.com/PaddlePaddle/Paddle/pull/65496), [#65461](https://github.com/PaddlePaddle/Paddle/pull/65461), [#65542](https://github.com/PaddlePaddle/Paddle/pull/65542), [#65575](https://github.com/PaddlePaddle/Paddle/pull/65575), [#65545](https://github.com/PaddlePaddle/Paddle/pull/65545), [#65609](https://github.com/PaddlePaddle/Paddle/pull/65609), [#65644](https://github.com/PaddlePaddle/Paddle/pull/65644), [#65700](https://github.com/PaddlePaddle/Paddle/pull/65700), [#65697](https://github.com/PaddlePaddle/Paddle/pull/65697), [#65719](https://github.com/PaddlePaddle/Paddle/pull/65719), [#65639](https://github.com/PaddlePaddle/Paddle/pull/65639), [#65742](https://github.com/PaddlePaddle/Paddle/pull/65742), [#65891](https://github.com/PaddlePaddle/Paddle/pull/65891), [#65877](https://github.com/PaddlePaddle/Paddle/pull/65877), [#65895](https://github.com/PaddlePaddle/Paddle/pull/65895), [#66007](https://github.com/PaddlePaddle/Paddle/pull/66007), [#66679](https://github.com/PaddlePaddle/Paddle/pull/66679), [#66680](https://github.com/PaddlePaddle/Paddle/pull/66680), [#66676](https://github.com/PaddlePaddle/Paddle/pull/66676), [#66677](https://github.com/PaddlePaddle/Paddle/pull/66677), [#66884](https://github.com/PaddlePaddle/Paddle/pull/66884), [#67288](https://github.com/PaddlePaddle/Paddle/pull/67288), [#67302](https://github.com/PaddlePaddle/Paddle/pull/67302), [#66978](https://github.com/PaddlePaddle/Paddle/pull/66978), [#67295](https://github.com/PaddlePaddle/Paddle/pull/67295), [#67520](https://github.com/PaddlePaddle/Paddle/pull/67520), [#67421](https://github.com/PaddlePaddle/Paddle/pull/67421), [#67529](https://github.com/PaddlePaddle/Paddle/pull/67529), [#67536](https://github.com/PaddlePaddle/Paddle/pull/67536), [#67618](https://github.com/PaddlePaddle/Paddle/pull/67618), [#67661](https://github.com/PaddlePaddle/Paddle/pull/67661), [#67698](https://github.com/PaddlePaddle/Paddle/pull/67698), [#67800](https://github.com/PaddlePaddle/Paddle/pull/67800), [#67933](https://github.com/PaddlePaddle/Paddle/pull/67933), [#67893](https://github.com/PaddlePaddle/Paddle/pull/67893), [#68108](https://github.com/PaddlePaddle/Paddle/pull/68108), [#67927](https://github.com/PaddlePaddle/Paddle/pull/67927), [#68322](https://github.com/PaddlePaddle/Paddle/pull/68322), [#68341](https://github.com/PaddlePaddle/Paddle/pull/68341), [#68415](https://github.com/PaddlePaddle/Paddle/pull/68415), [#68372](https://github.com/PaddlePaddle/Paddle/pull/68372), [#68559](https://github.com/PaddlePaddle/Paddle/pull/68559), [#68598](https://github.com/PaddlePaddle/Paddle/pull/68598), [#68708](https://github.com/PaddlePaddle/Paddle/pull/68708), [#68780](https://github.com/PaddlePaddle/Paddle/pull/68780), [#68992](https://github.com/PaddlePaddle/Paddle/pull/68992), [#68989](https://github.com/PaddlePaddle/Paddle/pull/68989), [#68895](https://github.com/PaddlePaddle/Paddle/pull/68895), [#69014](https://github.com/PaddlePaddle/Paddle/pull/69014), [#69139](https://github.com/PaddlePaddle/Paddle/pull/69139), [#68996](https://github.com/PaddlePaddle/Paddle/pull/68996), [#69090](https://github.com/PaddlePaddle/Paddle/pull/69090), [#68922](https://github.com/PaddlePaddle/Paddle/pull/68922), [#69333](https://github.com/PaddlePaddle/Paddle/pull/69333), [#69141](https://github.com/PaddlePaddle/Paddle/pull/69141), [#69609](https://github.com/PaddlePaddle/Paddle/pull/69609), [#69652](https://github.com/PaddlePaddle/Paddle/pull/69652), [#69715](https://github.com/PaddlePaddle/Paddle/pull/69715), [#69716](https://github.com/PaddlePaddle/Paddle/pull/69716), [#69934](https://github.com/PaddlePaddle/Paddle/pull/69934), [#70253](https://github.com/PaddlePaddle/Paddle/pull/70253), [#70297](https://github.com/PaddlePaddle/Paddle/pull/70297), [#70252](https://github.com/PaddlePaddle/Paddle/pull/70252), [#70468](https://github.com/PaddlePaddle/Paddle/pull/70468), [#70102](https://github.com/PaddlePaddle/Paddle/pull/70102), [#70546](https://github.com/PaddlePaddle/Paddle/pull/70546), [#70616](https://github.com/PaddlePaddle/Paddle/pull/70616), [#70582](https://github.com/PaddlePaddle/Paddle/pull/70582), [#70635](https://github.com/PaddlePaddle/Paddle/pull/70635), [#70499](https://github.com/PaddlePaddle/Paddle/pull/70499), [#70755](https://github.com/PaddlePaddle/Paddle/pull/70755), [#70935](https://github.com/PaddlePaddle/Paddle/pull/70935), [#71133](https://github.com/PaddlePaddle/Paddle/pull/71133), [#71172](https://github.com/PaddlePaddle/Paddle/pull/71172), [#71238](https://github.com/PaddlePaddle/Paddle/pull/71238), [#71230](https://github.com/PaddlePaddle/Paddle/pull/71230), [#71394](https://github.com/PaddlePaddle/Paddle/pull/71394)
+- 修复`tensordot` API 输入输出类型不一致问题。[#72139](https://github.com/PaddlePaddle/Paddle/pull/72139)
+- 修复`atleast` API 输出是 Tensor 列表时的问题。[#73102](https://github.com/PaddlePaddle/Paddle/pull/73102)
+- 修复`nonzer` API 问题。[#72003](https://github.com/PaddlePaddle/Paddle/pull/72003)
+- 修复`dualpipev`中的内存泄漏问题。[#72070](https://github.com/PaddlePaddle/Paddle/pull/72070)
+- 修复`softmax`计算溢出问题。[#71935](https://github.com/PaddlePaddle/Paddle/pull/71935)
+- 修复`take_along_axis`中在`broadcast=False`时的形状检查问题。[#72436](https://github.com/PaddlePaddle/Paddle/pull/72436)
+- 修复`maximum`、`minimum`对 Nan 输入的不正确问题。[#71933](https://github.com/PaddlePaddle/Paddle/pull/71933)
+- 修复`visit_type` 问题。[#72782](https://github.com/PaddlePaddle/Paddle/pull/72782)
+- 修复`gather_scatter_functor`中的 int32 越界问题。[#72905](https://github.com/PaddlePaddle/Paddle/pull/72905)
+- 修复`Bernoulli`的 inplace 实现。[#73271](https://github.com/PaddlePaddle/Paddle/pull/73271)
+- 修复`moe_permute`、`moe_unpermute`问题。[#73365](https://github.com/PaddlePaddle/Paddle/pull/73365)
+- 修复`ast.parse`对 pyi 文件语法检查问题。[#71872](https://github.com/PaddlePaddle/Paddle/pull/71872)
+- 修复复数除法问题。[#73331](https://github.com/PaddlePaddle/Paddle/pull/73331)
+- 修复与 TensorRT 集成相关的问题。[#72302](https://github.com/PaddlePaddle/Paddle/pull/72302), [#72278](https://github.com/PaddlePaddle/Paddle/pull/72278)
+
+### 功能增强
+
+- 增强 API 的功能,提升 API 易用性,改善用户体验。包括但不限于扩展 API 支持的数据类型,API 参数检查,纠正 API 参数默认值,完善 API 返回值等。[#71997](https://github.com/PaddlePaddle/Paddle/pull/71997), [#72911](https://github.com/PaddlePaddle/Paddle/pull/72911), [#72985](https://github.com/PaddlePaddle/Paddle/pull/72985), [#73240](https://github.com/PaddlePaddle/Paddle/pull/73240), [#72927](https://github.com/PaddlePaddle/Paddle/pull/72927), [#73451](https://github.com/PaddlePaddle/Paddle/pull/73451), [#73416](https://github.com/PaddlePaddle/Paddle/pull/73416), [#73420](https://github.com/PaddlePaddle/Paddle/pull/73420), [#73347](https://github.com/PaddlePaddle/Paddle/pull/73347), [#73050](https://github.com/PaddlePaddle/Paddle/pull/73050), [#73246](https://github.com/PaddlePaddle/Paddle/pull/73246), [#73123](https://github.com/PaddlePaddle/Paddle/pull/73123), [#73336](https://github.com/PaddlePaddle/Paddle/pull/73336), [#73062](https://github.com/PaddlePaddle/Paddle/pull/73062), [#72201](https://github.com/PaddlePaddle/Paddle/pull/72201), [#72190](https://github.com/PaddlePaddle/Paddle/pull/72190)
+- 增强 API 对复数类型的支持。[#72279](https://github.com/PaddlePaddle/Paddle/pull/72279), [#72308](https://github.com/PaddlePaddle/Paddle/pull/72308), [#72518](https://github.com/PaddlePaddle/Paddle/pull/72518), [#72391](https://github.com/PaddlePaddle/Paddle/pull/72391), [#72239](https://github.com/PaddlePaddle/Paddle/pull/72239), [#72286](https://github.com/PaddlePaddle/Paddle/pull/72286), [#72169](https://github.com/PaddlePaddle/Paddle/pull/72169), [#72577](https://github.com/PaddlePaddle/Paddle/pull/72577), [#72619](https://github.com/PaddlePaddle/Paddle/pull/72619)
+- 增强 API 对 0-Size Tensor 的支持。[#72570](https://github.com/PaddlePaddle/Paddle/pull/72570), [#72692](https://github.com/PaddlePaddle/Paddle/pull/72692), [#72138](https://github.com/PaddlePaddle/Paddle/pull/72138), [#72410](https://github.com/PaddlePaddle/Paddle/pull/72410), [#72565](https://github.com/PaddlePaddle/Paddle/pull/72565), [#72262](https://github.com/PaddlePaddle/Paddle/pull/72262)
+- 修改对 API 代码中的拼写错误,以提高整体的准确性和专业性。[#71780](https://github.com/PaddlePaddle/Paddle/pull/71780), [#71786](https://github.com/PaddlePaddle/Paddle/pull/71786), [#72093](https://github.com/PaddlePaddle/Paddle/pull/72093), [#72113](https://github.com/PaddlePaddle/Paddle/pull/72113), [#72241](https://github.com/PaddlePaddle/Paddle/pull/72241), [#72237](https://github.com/PaddlePaddle/Paddle/pull/72237), [#72590](https://github.com/PaddlePaddle/Paddle/pull/72590), [#72591](https://github.com/PaddlePaddle/Paddle/pull/72591), [#72769](https://github.com/PaddlePaddle/Paddle/pull/72769), [#72858](https://github.com/PaddlePaddle/Paddle/pull/72858), [#73045](https://github.com/PaddlePaddle/Paddle/pull/73045), [#72195](https://github.com/PaddlePaddle/Paddle/pull/72195), [#72627](https://github.com/PaddlePaddle/Paddle/pull/72627), [#72657](https://github.com/PaddlePaddle/Paddle/pull/72657), [#73162](https://github.com/PaddlePaddle/Paddle/pull/73162), [#73402](https://github.com/PaddlePaddle/Paddle/pull/73402), [#72208](https://github.com/PaddlePaddle/Paddle/pull/72208), [#72659](https://github.com/PaddlePaddle/Paddle/pull/72659), [#72658](https://github.com/PaddlePaddle/Paddle/pull/72658), [#72660](https://github.com/PaddlePaddle/Paddle/pull/72660), [#72661](https://github.com/PaddlePaddle/Paddle/pull/72661), [#72656](https://github.com/PaddlePaddle/Paddle/pull/72656)
+- 通信优化减少显存峰值。[#72035](https://github.com/PaddlePaddle/Paddle/pull/72035)
+
+### 文档
+
+- 修正了文档中的错误,提高了文档的可用性和用户体验。[#72549](https://github.com/PaddlePaddle/Paddle/pull/72549), [#73036](https://github.com/PaddlePaddle/Paddle/pull/73036)
-### 文档优化
+### 开发者相关
-- 增强了若干 API 文档,使得文档易读和易懂。[#67772](https://github.com/PaddlePaddle/Paddle/pull/67772), [#69895](https://github.com/PaddlePaddle/Paddle/pull/69895), [#65904](https://github.com/PaddlePaddle/Paddle/pull/65904), [#66480](https://github.com/PaddlePaddle/Paddle/pull/66480), [#66974](https://github.com/PaddlePaddle/Paddle/pull/66974), [#67100](https://github.com/PaddlePaddle/Paddle/pull/67100), [#66991](https://github.com/PaddlePaddle/Paddle/pull/66991), [#67287](https://github.com/PaddlePaddle/Paddle/pull/67287), [#67841](https://github.com/PaddlePaddle/Paddle/pull/67841), [#68206](https://github.com/PaddlePaddle/Paddle/pull/68206), [#68305](https://github.com/PaddlePaddle/Paddle/pull/68305), [#68462](https://github.com/PaddlePaddle/Paddle/pull/68462), [#67061](https://github.com/PaddlePaddle/Paddle/pull/67061), [#66503](https://github.com/PaddlePaddle/Paddle/pull/66503), [#68856](https://github.com/PaddlePaddle/Paddle/pull/68856), [#68866](https://github.com/PaddlePaddle/Paddle/pull/68866), [#68768](https://github.com/PaddlePaddle/Paddle/pull/68768), [#69215](https://github.com/PaddlePaddle/Paddle/pull/69215), [#69449](https://github.com/PaddlePaddle/Paddle/pull/69449), [#69396](https://github.com/PaddlePaddle/Paddle/pull/69396), [#69498](https://github.com/PaddlePaddle/Paddle/pull/69498), [#69413](https://github.com/PaddlePaddle/Paddle/pull/69413), [#69404](https://github.com/PaddlePaddle/Paddle/pull/69404), [#69729](https://github.com/PaddlePaddle/Paddle/pull/69729), [#69749](https://github.com/PaddlePaddle/Paddle/pull/69749), [#69266](https://github.com/PaddlePaddle/Paddle/pull/69266), [#69989](https://github.com/PaddlePaddle/Paddle/pull/69989), [#70209](https://github.com/PaddlePaddle/Paddle/pull/70209), [#70128](https://github.com/PaddlePaddle/Paddle/pull/70128), [#70143](https://github.com/PaddlePaddle/Paddle/pull/70143), [#69874](https://github.com/PaddlePaddle/Paddle/pull/69874), [#70242](https://github.com/PaddlePaddle/Paddle/pull/70242), [#70145](https://github.com/PaddlePaddle/Paddle/pull/70145), [#70813](https://github.com/PaddlePaddle/Paddle/pull/70813), [#71046](https://github.com/PaddlePaddle/Paddle/pull/71046)
+- 代码风格检查规则更新。[#72896](https://github.com/PaddlePaddle/Paddle/pull/72896), [#73179](https://github.com/PaddlePaddle/Paddle/pull/73179), [#73060](https://github.com/PaddlePaddle/Paddle/pull/73060), [#72553](https://github.com/PaddlePaddle/Paddle/pull/72553), [#72915](https://github.com/PaddlePaddle/Paddle/pull/72915), [#72916](https://github.com/PaddlePaddle/Paddle/pull/72916), [#73338](https://github.com/PaddlePaddle/Paddle/pull/73338), [#72935](https://github.com/PaddlePaddle/Paddle/pull/72935), [#72325](https://github.com/PaddlePaddle/Paddle/pull/72325), [#72935](https://github.com/PaddlePaddle/Paddle/pull/72935)
+- 代码变量命名更新与代码迁移。[#73048](https://github.com/PaddlePaddle/Paddle/pull/73048), [#73148](https://github.com/PaddlePaddle/Paddle/pull/73148), [#73149](https://github.com/PaddlePaddle/Paddle/pull/73149), [#73264](https://github.com/PaddlePaddle/Paddle/pull/73264), [#73159](https://github.com/PaddlePaddle/Paddle/pull/73159), [#73124](https://github.com/PaddlePaddle/Paddle/pull/73124), [#73160](https://github.com/PaddlePaddle/Paddle/pull/73160), [#73161](https://github.com/PaddlePaddle/Paddle/pull/73161), [#73374](https://github.com/PaddlePaddle/Paddle/pull/73374), [#73395](https://github.com/PaddlePaddle/Paddle/pull/73395), [#73076](https://github.com/PaddlePaddle/Paddle/pull/73076), [#73163](https://github.com/PaddlePaddle/Paddle/pull/73163), [#73255](https://github.com/PaddlePaddle/Paddle/pull/73255)
+- LodTensor 退场。[#71968](https://github.com/PaddlePaddle/Paddle/pull/71968), [#72152](https://github.com/PaddlePaddle/Paddle/pull/72152), [#72145](https://github.com/PaddlePaddle/Paddle/pull/72145)
-## 2. 基础执行架构
+### 废弃代码清理
-PIR 全面推全并默认开启,支持一键动转静,保证了框架卓越的性能表现和良好的拓展性。
+- 无用代码清理。[#71795](https://github.com/PaddlePaddle/Paddle/pull/71795), [#71792](https://github.com/PaddlePaddle/Paddle/pull/71792), [#71794](https://github.com/PaddlePaddle/Paddle/pull/71794), [#71793](https://github.com/PaddlePaddle/Paddle/pull/71793), [#72265](https://github.com/PaddlePaddle/Paddle/pull/72265), [#73167](https://github.com/PaddlePaddle/Paddle/pull/73167), [#73115](https://github.com/PaddlePaddle/Paddle/pull/73115), [#73049](https://github.com/PaddlePaddle/Paddle/pull/73049), [#72162](https://github.com/PaddlePaddle/Paddle/pull/72162), [#72321](https://github.com/PaddlePaddle/Paddle/pull/72321), [#72336](https://github.com/PaddlePaddle/Paddle/pull/72336), [#72952](https://github.com/PaddlePaddle/Paddle/pull/72952), [#72828](https://github.com/PaddlePaddle/Paddle/pull/72828)
-### Bug 修复
+## 2. 基础执行架构
-- 修复参数配置导致的精度问题。 [#65814](https://github.com/PaddlePaddle/Paddle/pull/65814)
-- 修复 save/load 相关 Bug。 [#65268](https://github.com/PaddlePaddle/Paddle/pull/65268), [#65359](https://github.com/PaddlePaddle/Paddle/pull/65359), [#65373](https://github.com/PaddlePaddle/Paddle/pull/65373), [#65314](https://github.com/PaddlePaddle/Paddle/pull/65314), [#65446](https://github.com/PaddlePaddle/Paddle/pull/65446), [#65476](https://github.com/PaddlePaddle/Paddle/pull/65476), [#66891](https://github.com/PaddlePaddle/Paddle/pull/66891), [#66931](https://github.com/PaddlePaddle/Paddle/pull/66931), [#65978](https://github.com/PaddlePaddle/Paddle/pull/65978), [#67654](https://github.com/PaddlePaddle/Paddle/pull/67654), [#67906](https://github.com/PaddlePaddle/Paddle/pull/67906), [#68723](https://github.com/PaddlePaddle/Paddle/pull/68723), [#71452](https://github.com/PaddlePaddle/Paddle/pull/71452), [#71457](https://github.com/PaddlePaddle/Paddle/pull/71457), [#67819](https://github.com/PaddlePaddle/Paddle/pull/67819), [#68120](https://github.com/PaddlePaddle/Paddle/pull/68120), [#68300](https://github.com/PaddlePaddle/Paddle/pull/68300), [#68315](https://github.com/PaddlePaddle/Paddle/pull/68315), [#68743](https://github.com/PaddlePaddle/Paddle/pull/68743), [#68744](https://github.com/PaddlePaddle/Paddle/pull/68744), [#69585](https://github.com/PaddlePaddle/Paddle/pull/69585), [#71165](https://github.com/PaddlePaddle/Paddle/pull/71165), [#71400](https://github.com/PaddlePaddle/Paddle/pull/71400)
-- 跳过/修复在 PIR 模式下的失败单测,包括 Windows、XPU 等场景。 [#65690](https://github.com/PaddlePaddle/Paddle/pull/65690), [#65759](https://github.com/PaddlePaddle/Paddle/pull/65759), [#65730](https://github.com/PaddlePaddle/Paddle/pull/65730), [#65760](https://github.com/PaddlePaddle/Paddle/pull/65760), [#65833](https://github.com/PaddlePaddle/Paddle/pull/65833), [#65834](https://github.com/PaddlePaddle/Paddle/pull/65834), [#65856](https://github.com/PaddlePaddle/Paddle/pull/65856), [#65886](https://github.com/PaddlePaddle/Paddle/pull/65886), [#65899](https://github.com/PaddlePaddle/Paddle/pull/65899), [#65932](https://github.com/PaddlePaddle/Paddle/pull/65932), [#65998](https://github.com/PaddlePaddle/Paddle/pull/65998), [#65953](https://github.com/PaddlePaddle/Paddle/pull/65953), [#65997](https://github.com/PaddlePaddle/Paddle/pull/65997), [#66061](https://github.com/PaddlePaddle/Paddle/pull/66061), [#66111](https://github.com/PaddlePaddle/Paddle/pull/66111), [#66137](https://github.com/PaddlePaddle/Paddle/pull/66137), [#66073](https://github.com/PaddlePaddle/Paddle/pull/66073), [#66203](https://github.com/PaddlePaddle/Paddle/pull/66203), [#66227](https://github.com/PaddlePaddle/Paddle/pull/66227), [#65744](https://github.com/PaddlePaddle/Paddle/pull/65744), [#66234](https://github.com/PaddlePaddle/Paddle/pull/66234), [#67487](https://github.com/PaddlePaddle/Paddle/pull/67487), [#67561](https://github.com/PaddlePaddle/Paddle/pull/67561), [#67584](https://github.com/PaddlePaddle/Paddle/pull/67584), [#67742](https://github.com/PaddlePaddle/Paddle/pull/67742), [#69832](https://github.com/PaddlePaddle/Paddle/pull/69832), [#65885](https://github.com/PaddlePaddle/Paddle/pull/65885), [#66709](https://github.com/PaddlePaddle/Paddle/pull/66709), [#66734](https://github.com/PaddlePaddle/Paddle/pull/66734), [#66959](https://github.com/PaddlePaddle/Paddle/pull/66959), [#67399](https://github.com/PaddlePaddle/Paddle/pull/67399), [#67389](https://github.com/PaddlePaddle/Paddle/pull/67389), [#67230](https://github.com/PaddlePaddle/Paddle/pull/67230), [#67403](https://github.com/PaddlePaddle/Paddle/pull/67403), [#67619](https://github.com/PaddlePaddle/Paddle/pull/67619), [#67662](https://github.com/PaddlePaddle/Paddle/pull/67662), [#67902](https://github.com/PaddlePaddle/Paddle/pull/67902), [#67382](https://github.com/PaddlePaddle/Paddle/pull/67382), [#67430](https://github.com/PaddlePaddle/Paddle/pull/67430), [#67517](https://github.com/PaddlePaddle/Paddle/pull/67517), [#67533](https://github.com/PaddlePaddle/Paddle/pull/67533), [#67573](https://github.com/PaddlePaddle/Paddle/pull/67573), [#67468](https://github.com/PaddlePaddle/Paddle/pull/67468), [#67640](https://github.com/PaddlePaddle/Paddle/pull/67640), [#67667](https://github.com/PaddlePaddle/Paddle/pull/67667), [#67716](https://github.com/PaddlePaddle/Paddle/pull/67716), [#68386](https://github.com/PaddlePaddle/Paddle/pull/68386), [#67234](https://github.com/PaddlePaddle/Paddle/pull/67234), [#67266](https://github.com/PaddlePaddle/Paddle/pull/67266), [#67362](https://github.com/PaddlePaddle/Paddle/pull/67362), [#67631](https://github.com/PaddlePaddle/Paddle/pull/67631), [#68081](https://github.com/PaddlePaddle/Paddle/pull/68081)
-- 修复动态图相关 Bug。 [#65619](https://github.com/PaddlePaddle/Paddle/pull/65619), [#69163](https://github.com/PaddlePaddle/Paddle/pull/69163), [#68862](https://github.com/PaddlePaddle/Paddle/pull/68862), [#68164](https://github.com/PaddlePaddle/Paddle/pull/68164), [#69867](https://github.com/PaddlePaddle/Paddle/pull/69867)
-- 修复控制流相关 Bug。 [#65722](https://github.com/PaddlePaddle/Paddle/pull/65722), [#70181](https://github.com/PaddlePaddle/Paddle/pull/70181)
-- 修复 kernel 运算相关 Bug,包括运算位置、空指针等。 [#66334](https://github.com/PaddlePaddle/Paddle/pull/66334), [#67931](https://github.com/PaddlePaddle/Paddle/pull/67931), [#70353](https://github.com/PaddlePaddle/Paddle/pull/70353)
-- 修复 Amp 相关 Bug。 [#66778](https://github.com/PaddlePaddle/Paddle/pull/66778), [#67582](https://github.com/PaddlePaddle/Paddle/pull/67582), [#67704](https://github.com/PaddlePaddle/Paddle/pull/67704), [#68655](https://github.com/PaddlePaddle/Paddle/pull/68655)
-- 修复 CINN 相关 Bug。 [#69577](https://github.com/PaddlePaddle/Paddle/pull/69577), [#71101](https://github.com/PaddlePaddle/Paddle/pull/71101), [#71387](https://github.com/PaddlePaddle/Paddle/pull/71387), [#71401](https://github.com/PaddlePaddle/Paddle/pull/71401)
-- 修复动转静相关 Bug。 [#67617](https://github.com/PaddlePaddle/Paddle/pull/67617), [#67936](https://github.com/PaddlePaddle/Paddle/pull/67936), [#68938](https://github.com/PaddlePaddle/Paddle/pull/68938), [#68734](https://github.com/PaddlePaddle/Paddle/pull/68734), [#69010](https://github.com/PaddlePaddle/Paddle/pull/69010), [#69408](https://github.com/PaddlePaddle/Paddle/pull/69408), [#69461](https://github.com/PaddlePaddle/Paddle/pull/69461), [#69699](https://github.com/PaddlePaddle/Paddle/pull/69699), [#69774](https://github.com/PaddlePaddle/Paddle/pull/69774), [#69803](https://github.com/PaddlePaddle/Paddle/pull/69803), [#69853](https://github.com/PaddlePaddle/Paddle/pull/69853), [#70510](https://github.com/PaddlePaddle/Paddle/pull/70510), [#70830](https://github.com/PaddlePaddle/Paddle/pull/70830), [#70904](https://github.com/PaddlePaddle/Paddle/pull/70904), [#70913](https://github.com/PaddlePaddle/Paddle/pull/70913), [#71040](https://github.com/PaddlePaddle/Paddle/pull/71040), [#71048](https://github.com/PaddlePaddle/Paddle/pull/71048), [#71106](https://github.com/PaddlePaddle/Paddle/pull/71106), [#71201](https://github.com/PaddlePaddle/Paddle/pull/71201), [#71216](https://github.com/PaddlePaddle/Paddle/pull/71216), [#71223](https://github.com/PaddlePaddle/Paddle/pull/71223), [#71296](https://github.com/PaddlePaddle/Paddle/pull/71296), [#71385](https://github.com/PaddlePaddle/Paddle/pull/71385), [#71505](https://github.com/PaddlePaddle/Paddle/pull/71505), [#66934](https://github.com/PaddlePaddle/Paddle/pull/66934), [#71096](https://github.com/PaddlePaddle/Paddle/pull/71096), [#71144](https://github.com/PaddlePaddle/Paddle/pull/71144), [#71430](https://github.com/PaddlePaddle/Paddle/pull/71430), [#71437](https://github.com/PaddlePaddle/Paddle/pull/71437), [#71473](https://github.com/PaddlePaddle/Paddle/pull/71473), [#71412](https://github.com/PaddlePaddle/Paddle/pull/71412), [#65648](https://github.com/PaddlePaddle/Paddle/pull/65648), [#67853](https://github.com/PaddlePaddle/Paddle/pull/67853), [#66543](https://github.com/PaddlePaddle/Paddle/pull/66543), [#68229](https://github.com/PaddlePaddle/Paddle/pull/68229), [#70846](https://github.com/PaddlePaddle/Paddle/pull/70846), [#67532](https://github.com/PaddlePaddle/Paddle/pull/67532)
-- 修复其他 Bug,包括反向传播梯度计算、内存拷贝、执行器报错等。 [#65493](https://github.com/PaddlePaddle/Paddle/pull/65493), [#65678](https://github.com/PaddlePaddle/Paddle/pull/65678), [#65673](https://github.com/PaddlePaddle/Paddle/pull/65673), [#65794](https://github.com/PaddlePaddle/Paddle/pull/65794), [#66358](https://github.com/PaddlePaddle/Paddle/pull/66358), [#66875](https://github.com/PaddlePaddle/Paddle/pull/66875), [#67339](https://github.com/PaddlePaddle/Paddle/pull/67339), [#67465](https://github.com/PaddlePaddle/Paddle/pull/67465), [#67754](https://github.com/PaddlePaddle/Paddle/pull/67754), [#67835](https://github.com/PaddlePaddle/Paddle/pull/67835), [#67892](https://github.com/PaddlePaddle/Paddle/pull/67892), [#67967](https://github.com/PaddlePaddle/Paddle/pull/67967), [#67952](https://github.com/PaddlePaddle/Paddle/pull/67952), [#68036](https://github.com/PaddlePaddle/Paddle/pull/68036), [#68063](https://github.com/PaddlePaddle/Paddle/pull/68063), [#68128](https://github.com/PaddlePaddle/Paddle/pull/68128), [#68151](https://github.com/PaddlePaddle/Paddle/pull/68151), [#68140](https://github.com/PaddlePaddle/Paddle/pull/68140), [#68167](https://github.com/PaddlePaddle/Paddle/pull/68167), [#68200](https://github.com/PaddlePaddle/Paddle/pull/68200), [#68325](https://github.com/PaddlePaddle/Paddle/pull/68325), [#68376](https://github.com/PaddlePaddle/Paddle/pull/68376), [#68539](https://github.com/PaddlePaddle/Paddle/pull/68539), [#68530](https://github.com/PaddlePaddle/Paddle/pull/68530), [#68637](https://github.com/PaddlePaddle/Paddle/pull/68637), [#68639](https://github.com/PaddlePaddle/Paddle/pull/68639), [#68688](https://github.com/PaddlePaddle/Paddle/pull/68688), [#68751](https://github.com/PaddlePaddle/Paddle/pull/68751), [#68806](https://github.com/PaddlePaddle/Paddle/pull/68806), [#68810](https://github.com/PaddlePaddle/Paddle/pull/68810), [#68779](https://github.com/PaddlePaddle/Paddle/pull/68779), [#68811](https://github.com/PaddlePaddle/Paddle/pull/68811), [#68844](https://github.com/PaddlePaddle/Paddle/pull/68844), [#68790](https://github.com/PaddlePaddle/Paddle/pull/68790), [#68870](https://github.com/PaddlePaddle/Paddle/pull/68870), [#68960](https://github.com/PaddlePaddle/Paddle/pull/68960), [#68999](https://github.com/PaddlePaddle/Paddle/pull/68999), [#69036](https://github.com/PaddlePaddle/Paddle/pull/69036), [#69188](https://github.com/PaddlePaddle/Paddle/pull/69188), [#69234](https://github.com/PaddlePaddle/Paddle/pull/69234), [#69375](https://github.com/PaddlePaddle/Paddle/pull/69375), [#69399](https://github.com/PaddlePaddle/Paddle/pull/69399), [#69538](https://github.com/PaddlePaddle/Paddle/pull/69538), [#69603](https://github.com/PaddlePaddle/Paddle/pull/69603), [#69633](https://github.com/PaddlePaddle/Paddle/pull/69633), [#69765](https://github.com/PaddlePaddle/Paddle/pull/69765), [#69768](https://github.com/PaddlePaddle/Paddle/pull/69768), [#69821](https://github.com/PaddlePaddle/Paddle/pull/69821), [#70091](https://github.com/PaddlePaddle/Paddle/pull/70091), [#70123](https://github.com/PaddlePaddle/Paddle/pull/70123), [#70147](https://github.com/PaddlePaddle/Paddle/pull/70147), [#70201](https://github.com/PaddlePaddle/Paddle/pull/70201), [#70198](https://github.com/PaddlePaddle/Paddle/pull/70198), [#69815](https://github.com/PaddlePaddle/Paddle/pull/69815), [#70420](https://github.com/PaddlePaddle/Paddle/pull/70420), [#70377](https://github.com/PaddlePaddle/Paddle/pull/70377), [#70552](https://github.com/PaddlePaddle/Paddle/pull/70552), [#70545](https://github.com/PaddlePaddle/Paddle/pull/70545), [#70595](https://github.com/PaddlePaddle/Paddle/pull/70595), [#70836](https://github.com/PaddlePaddle/Paddle/pull/70836), [#70771](https://github.com/PaddlePaddle/Paddle/pull/70771), [#70922](https://github.com/PaddlePaddle/Paddle/pull/70922), [#70969](https://github.com/PaddlePaddle/Paddle/pull/70969), [#70926](https://github.com/PaddlePaddle/Paddle/pull/70926), [#71117](https://github.com/PaddlePaddle/Paddle/pull/71117), [#71151](https://github.com/PaddlePaddle/Paddle/pull/71151), [#71194](https://github.com/PaddlePaddle/Paddle/pull/71194), [#71234](https://github.com/PaddlePaddle/Paddle/pull/71234), [#71339](https://github.com/PaddlePaddle/Paddle/pull/71339), [#71445](https://github.com/PaddlePaddle/Paddle/pull/71445), [#66350](https://github.com/PaddlePaddle/Paddle/pull/66350), [#66533](https://github.com/PaddlePaddle/Paddle/pull/66533), [#66622](https://github.com/PaddlePaddle/Paddle/pull/66622), [#67721](https://github.com/PaddlePaddle/Paddle/pull/67721), [#67700](https://github.com/PaddlePaddle/Paddle/pull/67700), [#69207](https://github.com/PaddlePaddle/Paddle/pull/69207), [#69615](https://github.com/PaddlePaddle/Paddle/pull/69615), [#69785](https://github.com/PaddlePaddle/Paddle/pull/69785), [#67805](https://github.com/PaddlePaddle/Paddle/pull/67805)
-
-### 功能优化
-
-- 支持 save/load。 [#65296](https://github.com/PaddlePaddle/Paddle/pull/65296), [#65671](https://github.com/PaddlePaddle/Paddle/pull/65671), [#66231](https://github.com/PaddlePaddle/Paddle/pull/66231), [#66185](https://github.com/PaddlePaddle/Paddle/pull/66185), [#66722](https://github.com/PaddlePaddle/Paddle/pull/66722), [#66863](https://github.com/PaddlePaddle/Paddle/pull/66863), [#67057](https://github.com/PaddlePaddle/Paddle/pull/67057), [#68101](https://github.com/PaddlePaddle/Paddle/pull/68101), [#68628](https://github.com/PaddlePaddle/Paddle/pull/68628), [#66359](https://github.com/PaddlePaddle/Paddle/pull/66359), [#68481](https://github.com/PaddlePaddle/Paddle/pull/68481)
-- 优化自定义算子编译流程。 [#67615](https://github.com/PaddlePaddle/Paddle/pull/67615), [#67659](https://github.com/PaddlePaddle/Paddle/pull/67659)
-- 支持组合算子。 [#69121](https://github.com/PaddlePaddle/Paddle/pull/69121), [#69144](https://github.com/PaddlePaddle/Paddle/pull/69144), [#70204](https://github.com/PaddlePaddle/Paddle/pull/70204), [#71098](https://github.com/PaddlePaddle/Paddle/pull/71098), [#71335](https://github.com/PaddlePaddle/Paddle/pull/71335)
-- 支持 CINN 编译器执行。 [#69589](https://github.com/PaddlePaddle/Paddle/pull/69589), [#70115](https://github.com/PaddlePaddle/Paddle/pull/70115)
-- 支持 custom device。 [#70909](https://github.com/PaddlePaddle/Paddle/pull/70909), [#71294](https://github.com/PaddlePaddle/Paddle/pull/71294), [#71362](https://github.com/PaddlePaddle/Paddle/pull/71362), [#71010](https://github.com/PaddlePaddle/Paddle/pull/71010), [#71036](https://github.com/PaddlePaddle/Paddle/pull/71036), [#70637](https://github.com/PaddlePaddle/Paddle/pull/70637), [#71085](https://github.com/PaddlePaddle/Paddle/pull/71085)
-- 其他场景的执行支持。 [#65050](https://github.com/PaddlePaddle/Paddle/pull/65050), [#65664](https://github.com/PaddlePaddle/Paddle/pull/65664), [#65741](https://github.com/PaddlePaddle/Paddle/pull/65741), [#65786](https://github.com/PaddlePaddle/Paddle/pull/65786), [#65499](https://github.com/PaddlePaddle/Paddle/pull/65499), [#66441](https://github.com/PaddlePaddle/Paddle/pull/66441), [#67668](https://github.com/PaddlePaddle/Paddle/pull/67668), [#68199](https://github.com/PaddlePaddle/Paddle/pull/68199), [#69088](https://github.com/PaddlePaddle/Paddle/pull/69088), [#70199](https://github.com/PaddlePaddle/Paddle/pull/70199), [#70308](https://github.com/PaddlePaddle/Paddle/pull/70308), [#70709](https://github.com/PaddlePaddle/Paddle/pull/70709), [#70937](https://github.com/PaddlePaddle/Paddle/pull/70937), [#71066](https://github.com/PaddlePaddle/Paddle/pull/71066), [#71079](https://github.com/PaddlePaddle/Paddle/pull/71079), [#71121](https://github.com/PaddlePaddle/Paddle/pull/71121), [#71136](https://github.com/PaddlePaddle/Paddle/pull/71136), [#71205](https://github.com/PaddlePaddle/Paddle/pull/71205)
+支持 FP8 矩阵运算,提升模型训练效率,同时对多个模型进行增强,提升稳定性; 提供是 C_ops 的方式调用反向接口,方便显存优化和功能实验。
### 新特性
-- SOT 适配 Python 3.13 版本字节码,支持 Python 3.13 下以 SOT 模式转静。[#68071](https://github.com/PaddlePaddle/Paddle/pull/68071), [#69126](https://github.com/PaddlePaddle/Paddle/pull/69126), [#69131](https://github.com/PaddlePaddle/Paddle/pull/69131), [#69196](https://github.com/PaddlePaddle/Paddle/pull/69196), [#69232](https://github.com/PaddlePaddle/Paddle/pull/69232), [#69253](https://github.com/PaddlePaddle/Paddle/pull/69253), [#69267](https://github.com/PaddlePaddle/Paddle/pull/69267), [#69412](https://github.com/PaddlePaddle/Paddle/pull/69412), [#69431](https://github.com/PaddlePaddle/Paddle/pull/69431), [#69432](https://github.com/PaddlePaddle/Paddle/pull/69432), [#69436](https://github.com/PaddlePaddle/Paddle/pull/69436), [#69557](https://github.com/PaddlePaddle/Paddle/pull/69557), [#69567](https://github.com/PaddlePaddle/Paddle/pull/69567), [#69700](https://github.com/PaddlePaddle/Paddle/pull/69700), [#69707](https://github.com/PaddlePaddle/Paddle/pull/69707), [#69735](https://github.com/PaddlePaddle/Paddle/pull/69735), [#69738](https://github.com/PaddlePaddle/Paddle/pull/69738), [#69744](https://github.com/PaddlePaddle/Paddle/pull/69744), [#69753](https://github.com/PaddlePaddle/Paddle/pull/69753), [#69887](https://github.com/PaddlePaddle/Paddle/pull/69887), [#69920](https://github.com/PaddlePaddle/Paddle/pull/69920), [#69950](https://github.com/PaddlePaddle/Paddle/pull/69950), [#70319](https://github.com/PaddlePaddle/Paddle/pull/70319), [#70927](https://github.com/PaddlePaddle/Paddle/pull/70927)
-- 适配 custom device。 [#68061](https://github.com/PaddlePaddle/Paddle/pull/68061), [#68836](https://github.com/PaddlePaddle/Paddle/pull/68836), [#70366](https://github.com/PaddlePaddle/Paddle/pull/70366), [#70549](https://github.com/PaddlePaddle/Paddle/pull/70549)
-- 适配 PIR 前向执行。 [#65335](https://github.com/PaddlePaddle/Paddle/pull/65335)
-- 适配 save/load。 [#67910](https://github.com/PaddlePaddle/Paddle/pull/67910)
-- 适配 pylayer。 [#70335](https://github.com/PaddlePaddle/Paddle/pull/70335)
-- 适配 lazy_init。 [#67379](https://github.com/PaddlePaddle/Paddle/pull/67379), [#67467](https://github.com/PaddlePaddle/Paddle/pull/67467)
-- 优化 PIR 下的逻辑。 [#67961](https://github.com/PaddlePaddle/Paddle/pull/67961)
-- 其他场景的支持。 [#68344](https://github.com/PaddlePaddle/Paddle/pull/68344), [#70071](https://github.com/PaddlePaddle/Paddle/pull/70071), [#70291](https://github.com/PaddlePaddle/Paddle/pull/70291), [#70752](https://github.com/PaddlePaddle/Paddle/pull/70752), [#70812](https://github.com/PaddlePaddle/Paddle/pull/70812), [#71033](https://github.com/PaddlePaddle/Paddle/pull/71033)
+- 支持 FP8 矩阵乘法加速,提升计算性能与精度适配能力。 [#73092](https://github.com/PaddlePaddle/Paddle/pull/73092)
+- 0-size Tensor 执行支持。 [#71829](https://github.com/PaddlePaddle/Paddle/pull/71829), [#72263](https://github.com/PaddlePaddle/Paddle/pull/72263), [#72244](https://github.com/PaddlePaddle/Paddle/pull/72244), [#72814](https://github.com/PaddlePaddle/Paddle/pull/72814)
+- DeepEP 支持。 [#73495](https://github.com/PaddlePaddle/Paddle/pull/73495)
+- 默认开启 CINN 后端。 [#71838](https://github.com/PaddlePaddle/Paddle/pull/71838)
+- 支持 SOT 相关执行。 [#72472](https://github.com/PaddlePaddle/Paddle/pull/72472), [#72559](https://github.com/PaddlePaddle/Paddle/pull/72559), [#72466](https://github.com/PaddlePaddle/Paddle/pull/72466), [#73269](https://github.com/PaddlePaddle/Paddle/pull/73269), [#73329](https://github.com/PaddlePaddle/Paddle/pull/73329), [#73405](https://github.com/PaddlePaddle/Paddle/pull/73405), [#73399](https://github.com/PaddlePaddle/Paddle/pull/73399), [#73424](https://github.com/PaddlePaddle/Paddle/pull/73424), [#73509](https://github.com/PaddlePaddle/Paddle/pull/73509)
+- 支持动转静。 [#73417](https://github.com/PaddlePaddle/Paddle/pull/73417), [#73081](https://github.com/PaddlePaddle/Paddle/pull/73081)
+- 新增支持 stride 机制的 kernel。 [#73053](https://github.com/PaddlePaddle/Paddle/pull/73053)
-### 普通用户无关改动
+### Bug 修复
-- SOT 调试体验优化,开发效率提升。[#67560](https://github.com/PaddlePaddle/Paddle/pull/67560), [#69072](https://github.com/PaddlePaddle/Paddle/pull/69072), [#69837](https://github.com/PaddlePaddle/Paddle/pull/69837), [#70134](https://github.com/PaddlePaddle/Paddle/pull/70134), [#70387](https://github.com/PaddlePaddle/Paddle/pull/70387), [#70740](https://github.com/PaddlePaddle/Paddle/pull/70740), [#71118](https://github.com/PaddlePaddle/Paddle/pull/71118), [#71268](https://github.com/PaddlePaddle/Paddle/pull/71268), [#71275](https://github.com/PaddlePaddle/Paddle/pull/71275), [#71458](https://github.com/PaddlePaddle/Paddle/pull/71458), [#71460](https://github.com/PaddlePaddle/Paddle/pull/71460)
-- 其他与用户使用无关的改动。 [#65393](https://github.com/PaddlePaddle/Paddle/pull/65393), [#65795](https://github.com/PaddlePaddle/Paddle/pull/65795), [#65799](https://github.com/PaddlePaddle/Paddle/pull/65799), [#65911](https://github.com/PaddlePaddle/Paddle/pull/65911), [#65977](https://github.com/PaddlePaddle/Paddle/pull/65977), [#66982](https://github.com/PaddlePaddle/Paddle/pull/66982), [#67563](https://github.com/PaddlePaddle/Paddle/pull/67563), [#68761](https://github.com/PaddlePaddle/Paddle/pull/68761), [#68909](https://github.com/PaddlePaddle/Paddle/pull/68909), [#69130](https://github.com/PaddlePaddle/Paddle/pull/69130), [#69233](https://github.com/PaddlePaddle/Paddle/pull/69233), [#69956](https://github.com/PaddlePaddle/Paddle/pull/69956), [#71142](https://github.com/PaddlePaddle/Paddle/pull/71142)
+- 性能优化与稳定性:优化训练稳定性,增强 Python 3.11+支持,提升 CINN 编译器在动态图模式下的自动启用逻辑,修复动态 shape 推断与梯度回传问题,优化 GPU 内核执行效率(如 for_range、常量折叠),改进 NPU 内存拷贝与上下文管理,提升大规模模型训练性能与硬件利用率。 [#71777](https://github.com/PaddlePaddle/Paddle/pull/71777), [#71837](https://github.com/PaddlePaddle/Paddle/pull/71837), [#71834](https://github.com/PaddlePaddle/Paddle/pull/71834), [#71950](https://github.com/PaddlePaddle/Paddle/pull/71950), [#71960](https://github.com/PaddlePaddle/Paddle/pull/71960), [#72103](https://github.com/PaddlePaddle/Paddle/pull/72103), [#70652](https://github.com/PaddlePaddle/Paddle/pull/70652), [#72313](https://github.com/PaddlePaddle/Paddle/pull/72313), [#72405](https://github.com/PaddlePaddle/Paddle/pull/72405), [#72581](https://github.com/PaddlePaddle/Paddle/pull/72581), [#73418](https://github.com/PaddlePaddle/Paddle/pull/73418)
+- 大 Tensor 支持扩展:扩展算子对超大尺寸 Tensor 的支持,包括数学运算(lerp/mean/bmm/trapezoid)、张量操作(arg_min_max/diag/prelu)、填充(pad)、比较(allclose/isclose)及融合算子(softmax_mask_fuse)等,解决混合精度训练中的兼容性问题。 [#71916](https://github.com/PaddlePaddle/Paddle/pull/71916), [#71970](https://github.com/PaddlePaddle/Paddle/pull/71970), [#72516](https://github.com/PaddlePaddle/Paddle/pull/72516), [#72517](https://github.com/PaddlePaddle/Paddle/pull/72517), [#72638](https://github.com/PaddlePaddle/Paddle/pull/72638), [#72652](https://github.com/PaddlePaddle/Paddle/pull/72652), [#73046](https://github.com/PaddlePaddle/Paddle/pull/73046), [#73093](https://github.com/PaddlePaddle/Paddle/pull/73093), [#73136](https://github.com/PaddlePaddle/Paddle/pull/73136), [#72679](https://github.com/PaddlePaddle/Paddle/pull/72679), [#73174](https://github.com/PaddlePaddle/Paddle/pull/73174), [#73198](https://github.com/PaddlePaddle/Paddle/pull/73198), [#73121](https://github.com/PaddlePaddle/Paddle/pull/73121), [#73096](https://github.com/PaddlePaddle/Paddle/pull/73096), [#73261](https://github.com/PaddlePaddle/Paddle/pull/73261), [#73201](https://github.com/PaddlePaddle/Paddle/pull/73201), [#73291](https://github.com/PaddlePaddle/Paddle/pull/73291), [#73373](https://github.com/PaddlePaddle/Paddle/pull/73373), [#73318](https://github.com/PaddlePaddle/Paddle/pull/73318), [#73436](https://github.com/PaddlePaddle/Paddle/pull/73436), [#72705](https://github.com/PaddlePaddle/Paddle/pull/72705), [#72276](https://github.com/PaddlePaddle/Paddle/pull/72276), [#73135](https://github.com/PaddlePaddle/Paddle/pull/73135), [#73304](https://github.com/PaddlePaddle/Paddle/pull/73304), [#73381](https://github.com/PaddlePaddle/Paddle/pull/73381), [#72712](https://github.com/PaddlePaddle/Paddle/pull/72712), [#72717](https://github.com/PaddlePaddle/Paddle/pull/72717), [#72634](https://github.com/PaddlePaddle/Paddle/pull/72634), [#72562](https://github.com/PaddlePaddle/Paddle/pull/72562), [#72628](https://github.com/PaddlePaddle/Paddle/pull/72628), [#72706](https://github.com/PaddlePaddle/Paddle/pull/72706), [#72831](https://github.com/PaddlePaddle/Paddle/pull/72831), [#72888](https://github.com/PaddlePaddle/Paddle/pull/72888), [#72753](https://github.com/PaddlePaddle/Paddle/pull/72753), [#72931](https://github.com/PaddlePaddle/Paddle/pull/72931), [#73021](https://github.com/PaddlePaddle/Paddle/pull/73021), [#73064](https://github.com/PaddlePaddle/Paddle/pull/73064), [#73069](https://github.com/PaddlePaddle/Paddle/pull/73069), [#73153](https://github.com/PaddlePaddle/Paddle/pull/73153), [#73118](https://github.com/PaddlePaddle/Paddle/pull/73118), [#73252](https://github.com/PaddlePaddle/Paddle/pull/73252), [#73253](https://github.com/PaddlePaddle/Paddle/pull/73253), [#73262](https://github.com/PaddlePaddle/Paddle/pull/73262), [#73259](https://github.com/PaddlePaddle/Paddle/pull/73259), [#73288](https://github.com/PaddlePaddle/Paddle/pull/73288), [#73105](https://github.com/PaddlePaddle/Paddle/pull/73105), [#73275](https://github.com/PaddlePaddle/Paddle/pull/73275), [#73284](https://github.com/PaddlePaddle/Paddle/pull/73284), [#73110](https://github.com/PaddlePaddle/Paddle/pull/73110), [#73335](https://github.com/PaddlePaddle/Paddle/pull/73335), [#73342](https://github.com/PaddlePaddle/Paddle/pull/73342), [#73447](https://github.com/PaddlePaddle/Paddle/pull/73447), [#73460](https://github.com/PaddlePaddle/Paddle/pull/73460), [#73194](https://github.com/PaddlePaddle/Paddle/pull/73194)
+- 0-Size Tensor 问题修复:修复 0-Size Tensor 导致的计算异常,覆盖池化(max_pool1d/lp_pool1d)、排序(matrix_rank)、统计(std/nanmedian)及元素级操作(elementwise compare)等,确保极端输入场景下的数值稳定性与 API 一致性。 [#71961](https://github.com/PaddlePaddle/Paddle/pull/71961), [#72017](https://github.com/PaddlePaddle/Paddle/pull/72017), [#72785](https://github.com/PaddlePaddle/Paddle/pull/72785), [#73214](https://github.com/PaddlePaddle/Paddle/pull/73214), [#73263](https://github.com/PaddlePaddle/Paddle/pull/73263), [#73267](https://github.com/PaddlePaddle/Paddle/pull/73267), [#73280](https://github.com/PaddlePaddle/Paddle/pull/73280), [#72444](https://github.com/PaddlePaddle/Paddle/pull/72444), [#72437](https://github.com/PaddlePaddle/Paddle/pull/72437), [#72460](https://github.com/PaddlePaddle/Paddle/pull/72460), [#73090](https://github.com/PaddlePaddle/Paddle/pull/73090), [#73516](https://github.com/PaddlePaddle/Paddle/pull/73516), [#72807](https://github.com/PaddlePaddle/Paddle/pull/72807), [#72799](https://github.com/PaddlePaddle/Paddle/pull/72799), [#72800](https://github.com/PaddlePaddle/Paddle/pull/72800), [#72809](https://github.com/PaddlePaddle/Paddle/pull/72809), [#73497](https://github.com/PaddlePaddle/Paddle/pull/73497)
+- API 功能增强与兼容性:新增对 Python 标准库类型(dataclasses)的支持,扩展 API 数据类型兼容性(bfloat16 参数创建、-1 维自动推断),修复 NumPy API 交互错误,优化 BatchNorm 内存布局。 [#72059](https://github.com/PaddlePaddle/Paddle/pull/72059), [#72283](https://github.com/PaddlePaddle/Paddle/pull/72283), [#72451](https://github.com/PaddlePaddle/Paddle/pull/72451), [#72512](https://github.com/PaddlePaddle/Paddle/pull/72512), [#72618](https://github.com/PaddlePaddle/Paddle/pull/72618), [#72976](https://github.com/PaddlePaddle/Paddle/pull/72976), [#73084](https://github.com/PaddlePaddle/Paddle/pull/73084), [#73205](https://github.com/PaddlePaddle/Paddle/pull/73205), [#73250](https://github.com/PaddlePaddle/Paddle/pull/73250), [#73111](https://github.com/PaddlePaddle/Paddle/pull/73111), [#73260](https://github.com/PaddlePaddle/Paddle/pull/73260), [#72094](https://github.com/PaddlePaddle/Paddle/pull/72094), [#71844](https://github.com/PaddlePaddle/Paddle/pull/71844), [#71357](https://github.com/PaddlePaddle/Paddle/pull/71357)
+- 内存管理与错误修复:解决内存越界(set_value/nonzero)、空指针(data nullptr)、CUDA graph 分配失败等高危问题,修复梯度裁剪(clip_grad)、张量赋值(assign)、广播(broadcast)等核心操作的内存泄漏与计算错误,优化 NPU 异步执行与预测器 GIL 释放逻辑,提升系统健壮性。 [#71895](https://github.com/PaddlePaddle/Paddle/pull/71895), [#72101](https://github.com/PaddlePaddle/Paddle/pull/72101), [#72133](https://github.com/PaddlePaddle/Paddle/pull/72133), [#72149](https://github.com/PaddlePaddle/Paddle/pull/72149), [#72176](https://github.com/PaddlePaddle/Paddle/pull/72176), [#72314](https://github.com/PaddlePaddle/Paddle/pull/72314), [#72256](https://github.com/PaddlePaddle/Paddle/pull/72256), [#72757](https://github.com/PaddlePaddle/Paddle/pull/72757), [#72749](https://github.com/PaddlePaddle/Paddle/pull/72749), [#72792](https://github.com/PaddlePaddle/Paddle/pull/72792), [#72815](https://github.com/PaddlePaddle/Paddle/pull/72815), [#72819](https://github.com/PaddlePaddle/Paddle/pull/72819), [#72958](https://github.com/PaddlePaddle/Paddle/pull/72958), [#73023](https://github.com/PaddlePaddle/Paddle/pull/73023), [#73103](https://github.com/PaddlePaddle/Paddle/pull/73103), [#73014](https://github.com/PaddlePaddle/Paddle/pull/73014), [#73137](https://github.com/PaddlePaddle/Paddle/pull/73137), [#73256](https://github.com/PaddlePaddle/Paddle/pull/73256), [#73211](https://github.com/PaddlePaddle/Paddle/pull/73211), [#73251](https://github.com/PaddlePaddle/Paddle/pull/73251), [#73210](https://github.com/PaddlePaddle/Paddle/pull/73210), [#73415](https://github.com/PaddlePaddle/Paddle/pull/73415), [#73206](https://github.com/PaddlePaddle/Paddle/pull/73206), [#71983](https://github.com/PaddlePaddle/Paddle/pull/71983), [#72485](https://github.com/PaddlePaddle/Paddle/pull/72485), [#72561](https://github.com/PaddlePaddle/Paddle/pull/72561)
+- 其他重要修复:修复科学计算、save/load 等模块缺陷,改进 Slice 算子内核配置,优化动态 shaoe 推断的回退策略,完善异常抛出与类型检查逻辑等。 [#71810](https://github.com/PaddlePaddle/Paddle/pull/71810), [#72246](https://github.com/PaddlePaddle/Paddle/pull/72246), [#72378](https://github.com/PaddlePaddle/Paddle/pull/72378), [#72467](https://github.com/PaddlePaddle/Paddle/pull/72467), [#72635](https://github.com/PaddlePaddle/Paddle/pull/72635), [#72751](https://github.com/PaddlePaddle/Paddle/pull/72751), [#72044](https://github.com/PaddlePaddle/Paddle/pull/72044), [#72051](https://github.com/PaddlePaddle/Paddle/pull/72051), [#73231](https://github.com/PaddlePaddle/Paddle/pull/73231), [#73109](https://github.com/PaddlePaddle/Paddle/pull/73109)
+- SOT 相关问题修复, [#71932](https://github.com/PaddlePaddle/Paddle/pull/71932), [#71971](https://github.com/PaddlePaddle/Paddle/pull/71971), [#72194](https://github.com/PaddlePaddle/Paddle/pull/72194), [#72288](https://github.com/PaddlePaddle/Paddle/pull/72288), [#72306](https://github.com/PaddlePaddle/Paddle/pull/72306), [#72367](https://github.com/PaddlePaddle/Paddle/pull/72367), [#72495](https://github.com/PaddlePaddle/Paddle/pull/72495), [#72522](https://github.com/PaddlePaddle/Paddle/pull/72522), [#72704](https://github.com/PaddlePaddle/Paddle/pull/72704), [#72631](https://github.com/PaddlePaddle/Paddle/pull/72631), [#72737](https://github.com/PaddlePaddle/Paddle/pull/72737), [#73067](https://github.com/PaddlePaddle/Paddle/pull/73067), [#73030](https://github.com/PaddlePaddle/Paddle/pull/73030), [#73059](https://github.com/PaddlePaddle/Paddle/pull/73059), [#73282](https://github.com/PaddlePaddle/Paddle/pull/73282), [#73511](https://github.com/PaddlePaddle/Paddle/pull/73511), [#73526](https://github.com/PaddlePaddle/Paddle/pull/73526), [#73549](https://github.com/PaddlePaddle/Paddle/pull/73549), [#73515](https://github.com/PaddlePaddle/Paddle/pull/73515)
-### 安全问题
+### 功能增强
-- 为 IR(中间表示)的保存/加载操作引入了审批规则,以增强模型序列化过程中的安全性和治理。 [#65737](https://github.com/PaddlePaddle/Paddle/pull/65737)
+- Paddle API 0-size 机制建设。 [#72721](https://github.com/PaddlePaddle/Paddle/pull/72721), [#72756](https://github.com/PaddlePaddle/Paddle/pull/72756), [#72790](https://github.com/PaddlePaddle/Paddle/pull/72790), [#72806](https://github.com/PaddlePaddle/Paddle/pull/72806), [#72764](https://github.com/PaddlePaddle/Paddle/pull/72764), [#72786](https://github.com/PaddlePaddle/Paddle/pull/72786), [#72853](https://github.com/PaddlePaddle/Paddle/pull/72853), [#72826](https://github.com/PaddlePaddle/Paddle/pull/72826), [#72851](https://github.com/PaddlePaddle/Paddle/pull/72851), [#72928](https://github.com/PaddlePaddle/Paddle/pull/72928), [#72912](https://github.com/PaddlePaddle/Paddle/pull/72912), [#72922](https://github.com/PaddlePaddle/Paddle/pull/72922), [#72924](https://github.com/PaddlePaddle/Paddle/pull/72924), [#72887](https://github.com/PaddlePaddle/Paddle/pull/72887), [#72921](https://github.com/PaddlePaddle/Paddle/pull/72921), [#72906](https://github.com/PaddlePaddle/Paddle/pull/72906), [#72895](https://github.com/PaddlePaddle/Paddle/pull/72895), [#72821](https://github.com/PaddlePaddle/Paddle/pull/72821), [#72914](https://github.com/PaddlePaddle/Paddle/pull/72914), [#72936](https://github.com/PaddlePaddle/Paddle/pull/72936), [#72943](https://github.com/PaddlePaddle/Paddle/pull/72943), [#72694](https://github.com/PaddlePaddle/Paddle/pull/72694), [#72919](https://github.com/PaddlePaddle/Paddle/pull/72919), [#72940](https://github.com/PaddlePaddle/Paddle/pull/72940), [#72820](https://github.com/PaddlePaddle/Paddle/pull/72820), [#72934](https://github.com/PaddlePaddle/Paddle/pull/72934), [#72975](https://github.com/PaddlePaddle/Paddle/pull/72975), [#72872](https://github.com/PaddlePaddle/Paddle/pull/72872), [#72984](https://github.com/PaddlePaddle/Paddle/pull/72984), [#72988](https://github.com/PaddlePaddle/Paddle/pull/72988), [#72972](https://github.com/PaddlePaddle/Paddle/pull/72972), [#72977](https://github.com/PaddlePaddle/Paddle/pull/72977), [#72937](https://github.com/PaddlePaddle/Paddle/pull/72937), [#73086](https://github.com/PaddlePaddle/Paddle/pull/73086), [#73042](https://github.com/PaddlePaddle/Paddle/pull/73042), [#73017](https://github.com/PaddlePaddle/Paddle/pull/73017), [#73044](https://github.com/PaddlePaddle/Paddle/pull/73044), [#73077](https://github.com/PaddlePaddle/Paddle/pull/73077), [#73108](https://github.com/PaddlePaddle/Paddle/pull/73108), [#73027](https://github.com/PaddlePaddle/Paddle/pull/73027), [#72970](https://github.com/PaddlePaddle/Paddle/pull/72970), [#73008](https://github.com/PaddlePaddle/Paddle/pull/73008), [#72996](https://github.com/PaddlePaddle/Paddle/pull/72996), [#73165](https://github.com/PaddlePaddle/Paddle/pull/73165), [#73166](https://github.com/PaddlePaddle/Paddle/pull/73166), [#73170](https://github.com/PaddlePaddle/Paddle/pull/73170), [#73122](https://github.com/PaddlePaddle/Paddle/pull/73122), [#73204](https://github.com/PaddlePaddle/Paddle/pull/73204), [#73207](https://github.com/PaddlePaddle/Paddle/pull/73207), [#73186](https://github.com/PaddlePaddle/Paddle/pull/73186), [#73197](https://github.com/PaddlePaddle/Paddle/pull/73197), [#73168](https://github.com/PaddlePaddle/Paddle/pull/73168), [#73172](https://github.com/PaddlePaddle/Paddle/pull/73172), [#73125](https://github.com/PaddlePaddle/Paddle/pull/73125), [#73181](https://github.com/PaddlePaddle/Paddle/pull/73181), [#73270](https://github.com/PaddlePaddle/Paddle/pull/73270), [#73028](https://github.com/PaddlePaddle/Paddle/pull/73028), [#73094](https://github.com/PaddlePaddle/Paddle/pull/73094), [#73180](https://github.com/PaddlePaddle/Paddle/pull/73180), [#73276](https://github.com/PaddlePaddle/Paddle/pull/73276), [#73333](https://github.com/PaddlePaddle/Paddle/pull/73333), [#73341](https://github.com/PaddlePaddle/Paddle/pull/73341), [#73299](https://github.com/PaddlePaddle/Paddle/pull/73299), [#73346](https://github.com/PaddlePaddle/Paddle/pull/73346), [#73361](https://github.com/PaddlePaddle/Paddle/pull/73361), [#73375](https://github.com/PaddlePaddle/Paddle/pull/73375), [#73152](https://github.com/PaddlePaddle/Paddle/pull/73152), [#73377](https://github.com/PaddlePaddle/Paddle/pull/73377), [#73355](https://github.com/PaddlePaddle/Paddle/pull/73355), [#73382](https://github.com/PaddlePaddle/Paddle/pull/73382), [#73385](https://github.com/PaddlePaddle/Paddle/pull/73385), [#73386](https://github.com/PaddlePaddle/Paddle/pull/73386), [#73352](https://github.com/PaddlePaddle/Paddle/pull/73352), [#73387](https://github.com/PaddlePaddle/Paddle/pull/73387), [#73401](https://github.com/PaddlePaddle/Paddle/pull/73401), [#73384](https://github.com/PaddlePaddle/Paddle/pull/73384), [#73450](https://github.com/PaddlePaddle/Paddle/pull/73450), [#73437](https://github.com/PaddlePaddle/Paddle/pull/73437), [#73503](https://github.com/PaddlePaddle/Paddle/pull/73503), [#73507](https://github.com/PaddlePaddle/Paddle/pull/73507), [#73477](https://github.com/PaddlePaddle/Paddle/pull/73477), [#73513](https://github.com/PaddlePaddle/Paddle/pull/73513), [#73525](https://github.com/PaddlePaddle/Paddle/pull/73525), [#73528](https://github.com/PaddlePaddle/Paddle/pull/73528), [#73517](https://github.com/PaddlePaddle/Paddle/pull/73517), [#72898](https://github.com/PaddlePaddle/Paddle/pull/72898), [#72880](https://github.com/PaddlePaddle/Paddle/pull/72880), [#72864](https://github.com/PaddlePaddle/Paddle/pull/72864), [#72993](https://github.com/PaddlePaddle/Paddle/pull/72993), [#72954](https://github.com/PaddlePaddle/Paddle/pull/72954), [#72866](https://github.com/PaddlePaddle/Paddle/pull/72866), [#72878](https://github.com/PaddlePaddle/Paddle/pull/72878), [#72889](https://github.com/PaddlePaddle/Paddle/pull/72889), [#72861](https://github.com/PaddlePaddle/Paddle/pull/72861), [#72837](https://github.com/PaddlePaddle/Paddle/pull/72837)
+- SOT 相关提升:增强了功能(如 NumPy 互操作性和 super 支持)、改进训练稳定性,修复多个问题以提升代码健壮性, [#71763](https://github.com/PaddlePaddle/Paddle/pull/71763), [#71666](https://github.com/PaddlePaddle/Paddle/pull/71666), [#71858](https://github.com/PaddlePaddle/Paddle/pull/71858), [#71865](https://github.com/PaddlePaddle/Paddle/pull/71865), [#72474](https://github.com/PaddlePaddle/Paddle/pull/72474), [#72154](https://github.com/PaddlePaddle/Paddle/pull/72154), [#72784](https://github.com/PaddlePaddle/Paddle/pull/72784), [#72956](https://github.com/PaddlePaddle/Paddle/pull/72956), [#73038](https://github.com/PaddlePaddle/Paddle/pull/73038), [#73066](https://github.com/PaddlePaddle/Paddle/pull/73066), [#73287](https://github.com/PaddlePaddle/Paddle/pull/73287), [#73278](https://github.com/PaddlePaddle/Paddle/pull/73278), [#73332](https://github.com/PaddlePaddle/Paddle/pull/73332), [#73372](https://github.com/PaddlePaddle/Paddle/pull/73372), [#73412](https://github.com/PaddlePaddle/Paddle/pull/73412), [#73407](https://github.com/PaddlePaddle/Paddle/pull/73407), [#73506](https://github.com/PaddlePaddle/Paddle/pull/73506)
+- 代码风格重构:通过代码重构及跨平台内核行为统一,提升代码质量与可维护性,并新增了 YAML 格式预提交检查工具, [#72216](https://github.com/PaddlePaddle/Paddle/pull/72216), [#72360](https://github.com/PaddlePaddle/Paddle/pull/72360), [#72816](https://github.com/PaddlePaddle/Paddle/pull/72816), [#72969](https://github.com/PaddlePaddle/Paddle/pull/72969), [#73106](https://github.com/PaddlePaddle/Paddle/pull/73106), [#72825](https://github.com/PaddlePaddle/Paddle/pull/72825), [#73150](https://github.com/PaddlePaddle/Paddle/pull/73150), [#73151](https://github.com/PaddlePaddle/Paddle/pull/73151), [#73158](https://github.com/PaddlePaddle/Paddle/pull/73158), [#73101](https://github.com/PaddlePaddle/Paddle/pull/73101), [#73326](https://github.com/PaddlePaddle/Paddle/pull/73326), [#72580](https://github.com/PaddlePaddle/Paddle/pull/72580), [#72424](https://github.com/PaddlePaddle/Paddle/pull/72424)
+- Paddle CPU/GPU Kernel 精度问题推全。 [#72879](https://github.com/PaddlePaddle/Paddle/pull/72879), [#72894](https://github.com/PaddlePaddle/Paddle/pull/72894), [#73012](https://github.com/PaddlePaddle/Paddle/pull/73012), [#72973](https://github.com/PaddlePaddle/Paddle/pull/72973), [#73018](https://github.com/PaddlePaddle/Paddle/pull/73018), [#72965](https://github.com/PaddlePaddle/Paddle/pull/72965), [#73128](https://github.com/PaddlePaddle/Paddle/pull/73128), [#73229](https://github.com/PaddlePaddle/Paddle/pull/73229), [#72992](https://github.com/PaddlePaddle/Paddle/pull/72992), [#73344](https://github.com/PaddlePaddle/Paddle/pull/73344), [#73274](https://github.com/PaddlePaddle/Paddle/pull/73274), [#73295](https://github.com/PaddlePaddle/Paddle/pull/73295), [#73293](https://github.com/PaddlePaddle/Paddle/pull/73293), [#73317](https://github.com/PaddlePaddle/Paddle/pull/73317), [#73320](https://github.com/PaddlePaddle/Paddle/pull/73320), [#73454](https://github.com/PaddlePaddle/Paddle/pull/73454), [#73492](https://github.com/PaddlePaddle/Paddle/pull/73492), [#73535](https://github.com/PaddlePaddle/Paddle/pull/73535)
-### 其他
+- slice 问题修复:修复了 slice 相关问题,包括索引逻辑、性能优化等, [#72644](https://github.com/PaddlePaddle/Paddle/pull/72644), [#72676](https://github.com/PaddlePaddle/Paddle/pull/72676), [#72838](https://github.com/PaddlePaddle/Paddle/pull/72838), [#72966](https://github.com/PaddlePaddle/Paddle/pull/72966), [#73095](https://github.com/PaddlePaddle/Paddle/pull/73095), [#72840](https://github.com/PaddlePaddle/Paddle/pull/72840), [#73112](https://github.com/PaddlePaddle/Paddle/pull/73112), [#73367](https://github.com/PaddlePaddle/Paddle/pull/73367), [#73390](https://github.com/PaddlePaddle/Paddle/pull/73390), [#73307](https://github.com/PaddlePaddle/Paddle/pull/73307), [#73465](https://github.com/PaddlePaddle/Paddle/pull/73465), [#73362](https://github.com/PaddlePaddle/Paddle/pull/73362), [#72733](https://github.com/PaddlePaddle/Paddle/pull/72733), [#72886](https://github.com/PaddlePaddle/Paddle/pull/72886)
+- 性能优化:通过优化索引逻辑、性能提升等手段,提升整体性能表现, [#72707](https://github.com/PaddlePaddle/Paddle/pull/72707), [#73485](https://github.com/PaddlePaddle/Paddle/pull/73485)
+- 其他重要提升:包括动态 shape 支持、修复 meshgrid 并增加单元测试、升级 CUB 至 2.1.0 版本、改进 FP8 数值处理、优化 CUDA 图共享池机制、移除 ShadowFeedOp 以简化数据流、增强 PIR 模型保存/加载的版本兼容性、修复 flip 和 reverse 内核问题、改进 paddle.angle 的 NaN 传播逻辑、引入异步 GC 检查机制、优化 Dy2St 的 Scope 无锁接口、清理未使用的第三方依赖(absl),并进一步推进 PHI 与 Fluid 的解耦,提升框架的稳定性、性能和扩展性。 [#72356](https://github.com/PaddlePaddle/Paddle/pull/72356), [#72380](https://github.com/PaddlePaddle/Paddle/pull/72380), [#72633](https://github.com/PaddlePaddle/Paddle/pull/72633), [#72794](https://github.com/PaddlePaddle/Paddle/pull/72794), [#72917](https://github.com/PaddlePaddle/Paddle/pull/72917), [#72920](https://github.com/PaddlePaddle/Paddle/pull/72920), [#72945](https://github.com/PaddlePaddle/Paddle/pull/72945), [#72620](https://github.com/PaddlePaddle/Paddle/pull/72620), [#73011](https://github.com/PaddlePaddle/Paddle/pull/73011), [#73051](https://github.com/PaddlePaddle/Paddle/pull/73051), [#73052](https://github.com/PaddlePaddle/Paddle/pull/73052), [#73075](https://github.com/PaddlePaddle/Paddle/pull/73075), [#73176](https://github.com/PaddlePaddle/Paddle/pull/73176), [#73191](https://github.com/PaddlePaddle/Paddle/pull/73191), [#73337](https://github.com/PaddlePaddle/Paddle/pull/73337), [#73311](https://github.com/PaddlePaddle/Paddle/pull/73311), [#73173](https://github.com/PaddlePaddle/Paddle/pull/73173), [#73239](https://github.com/PaddlePaddle/Paddle/pull/73239), [#73448](https://github.com/PaddlePaddle/Paddle/pull/73448), [#73478](https://github.com/PaddlePaddle/Paddle/pull/73478), [#73522](https://github.com/PaddlePaddle/Paddle/pull/73522), [#73369](https://github.com/PaddlePaddle/Paddle/pull/73369)
-- Sparse API 迁移。 [#66139](https://github.com/PaddlePaddle/Paddle/pull/66139), [#66319](https://github.com/PaddlePaddle/Paddle/pull/66319), [#66866](https://github.com/PaddlePaddle/Paddle/pull/66866)
-- PIR 功能扩展。 [#67966](https://github.com/PaddlePaddle/Paddle/pull/67966), [#69909](https://github.com/PaddlePaddle/Paddle/pull/69909)
-- 迁移文件位置。 [#66477](https://github.com/PaddlePaddle/Paddle/pull/66477), [#66824](https://github.com/PaddlePaddle/Paddle/pull/66824), [#67592](https://github.com/PaddlePaddle/Paddle/pull/67592)
-- 日志添加。 [#68382](https://github.com/PaddlePaddle/Paddle/pull/68382), [#70506](https://github.com/PaddlePaddle/Paddle/pull/70506)
-- 默认打开 PIR。 [#68278](https://github.com/PaddlePaddle/Paddle/pull/68278)
-- 头文件整理。 [#68422](https://github.com/PaddlePaddle/Paddle/pull/68422), [#68471](https://github.com/PaddlePaddle/Paddle/pull/68471)
-- 编译优化。 [#67831](https://github.com/PaddlePaddle/Paddle/pull/67831), [#67821](https://github.com/PaddlePaddle/Paddle/pull/67821), [#68717](https://github.com/PaddlePaddle/Paddle/pull/68717)
-- 用 guard 管理相关 test。 [#67816](https://github.com/PaddlePaddle/Paddle/pull/67816), [#67827](https://github.com/PaddlePaddle/Paddle/pull/67827), [#67989](https://github.com/PaddlePaddle/Paddle/pull/67989)
-- 拼写错误修复。 [#70784](https://github.com/PaddlePaddle/Paddle/pull/70784), [#70787](https://github.com/PaddlePaddle/Paddle/pull/70787)
-- 检查 cuda 错误。 [#70399](https://github.com/PaddlePaddle/Paddle/pull/70399)
-
-### 开发者
-
-- 动转静功能修复,提升整图转换成功率,优化推理导出体验。[#65291](https://github.com/PaddlePaddle/Paddle/pull/65291), [#66153](https://github.com/PaddlePaddle/Paddle/pull/66153), [#66379](https://github.com/PaddlePaddle/Paddle/pull/66379), [#66557](https://github.com/PaddlePaddle/Paddle/pull/66557), [#67021](https://github.com/PaddlePaddle/Paddle/pull/67021), [#67482](https://github.com/PaddlePaddle/Paddle/pull/67482), [#67495](https://github.com/PaddlePaddle/Paddle/pull/67495), [#67981](https://github.com/PaddlePaddle/Paddle/pull/67981), [#68030](https://github.com/PaddlePaddle/Paddle/pull/68030), [#68078](https://github.com/PaddlePaddle/Paddle/pull/68078), [#68328](https://github.com/PaddlePaddle/Paddle/pull/68328), [#68442](https://github.com/PaddlePaddle/Paddle/pull/68442), [#68679](https://github.com/PaddlePaddle/Paddle/pull/68679), [#68850](https://github.com/PaddlePaddle/Paddle/pull/68850), [#68892](https://github.com/PaddlePaddle/Paddle/pull/68892), [#68991](https://github.com/PaddlePaddle/Paddle/pull/68991), [#69043](https://github.com/PaddlePaddle/Paddle/pull/69043), [#69097](https://github.com/PaddlePaddle/Paddle/pull/69097), [#69210](https://github.com/PaddlePaddle/Paddle/pull/69210), [#69295](https://github.com/PaddlePaddle/Paddle/pull/69295), [#69428](https://github.com/PaddlePaddle/Paddle/pull/69428), [#69518](https://github.com/PaddlePaddle/Paddle/pull/69518), [#69642](https://github.com/PaddlePaddle/Paddle/pull/69642), [#69940](https://github.com/PaddlePaddle/Paddle/pull/69940), [#70118](https://github.com/PaddlePaddle/Paddle/pull/70118), [#70169](https://github.com/PaddlePaddle/Paddle/pull/70169), [#70218](https://github.com/PaddlePaddle/Paddle/pull/70218), [#70287](https://github.com/PaddlePaddle/Paddle/pull/70287), [#70412](https://github.com/PaddlePaddle/Paddle/pull/70412), [#71099](https://github.com/PaddlePaddle/Paddle/pull/71099), [#71156](https://github.com/PaddlePaddle/Paddle/pull/71156), [#71193](https://github.com/PaddlePaddle/Paddle/pull/71193), [#71336](https://github.com/PaddlePaddle/Paddle/pull/71336), [#71463](https://github.com/PaddlePaddle/Paddle/pull/71463), [#71476](https://github.com/PaddlePaddle/Paddle/pull/71476), [#71503](https://github.com/PaddlePaddle/Paddle/pull/71503)
-- Inplace 策略升级。 [#65491](https://github.com/PaddlePaddle/Paddle/pull/65491)
-- 控制流相关开发。 [#67251](https://github.com/PaddlePaddle/Paddle/pull/67251)
-- 添加环境变量。 [#68467](https://github.com/PaddlePaddle/Paddle/pull/68467)
-- 支持稀疏算子运算。 [#67111](https://github.com/PaddlePaddle/Paddle/pull/67111)
-- 其他执行支持开发,包括逻辑优化、版本适配、添加单测等。 [#69241](https://github.com/PaddlePaddle/Paddle/pull/69241), [#69806](https://github.com/PaddlePaddle/Paddle/pull/69806), [#70768](https://github.com/PaddlePaddle/Paddle/pull/70768), [#66829](https://github.com/PaddlePaddle/Paddle/pull/66829), [#67110](https://github.com/PaddlePaddle/Paddle/pull/67110), [#67442](https://github.com/PaddlePaddle/Paddle/pull/67442), [#67041](https://github.com/PaddlePaddle/Paddle/pull/67041), [#67452](https://github.com/PaddlePaddle/Paddle/pull/67452), [#69061](https://github.com/PaddlePaddle/Paddle/pull/69061), [#69307](https://github.com/PaddlePaddle/Paddle/pull/69307), [#68669](https://github.com/PaddlePaddle/Paddle/pull/68669), [#69829](https://github.com/PaddlePaddle/Paddle/pull/69829), [#70003](https://github.com/PaddlePaddle/Paddle/pull/70003), [#70443](https://github.com/PaddlePaddle/Paddle/pull/70443), [#70364](https://github.com/PaddlePaddle/Paddle/pull/70364), [#71495](https://github.com/PaddlePaddle/Paddle/pull/71495)
-
-### 性能优化
+### 性能提升
-- 优化动态 shape 场景转静能力,降低构图次数,减少编译时间。[#65235](https://github.com/PaddlePaddle/Paddle/pull/65235), [#65477](https://github.com/PaddlePaddle/Paddle/pull/65477), [#65517](https://github.com/PaddlePaddle/Paddle/pull/65517), [#65882](https://github.com/PaddlePaddle/Paddle/pull/65882), [#66346](https://github.com/PaddlePaddle/Paddle/pull/66346), [#66746](https://github.com/PaddlePaddle/Paddle/pull/66746), [#67786](https://github.com/PaddlePaddle/Paddle/pull/67786), [#67876](https://github.com/PaddlePaddle/Paddle/pull/67876), [#68113](https://github.com/PaddlePaddle/Paddle/pull/68113), [#68302](https://github.com/PaddlePaddle/Paddle/pull/68302), [#68337](https://github.com/PaddlePaddle/Paddle/pull/68337), [#68616](https://github.com/PaddlePaddle/Paddle/pull/68616), [#69354](https://github.com/PaddlePaddle/Paddle/pull/69354), [#70009](https://github.com/PaddlePaddle/Paddle/pull/70009), [#70877](https://github.com/PaddlePaddle/Paddle/pull/70877)
-- SOT 端到端性能优化,减少子图打断,降低调度开销,提升转静训练性能。[#67591](https://github.com/PaddlePaddle/Paddle/pull/67591), [#67746](https://github.com/PaddlePaddle/Paddle/pull/67746), [#67823](https://github.com/PaddlePaddle/Paddle/pull/67823), [#67890](https://github.com/PaddlePaddle/Paddle/pull/67890), [#67921](https://github.com/PaddlePaddle/Paddle/pull/67921), [#68031](https://github.com/PaddlePaddle/Paddle/pull/68031), [#68153](https://github.com/PaddlePaddle/Paddle/pull/68153), [#68729](https://github.com/PaddlePaddle/Paddle/pull/68729), [#69249](https://github.com/PaddlePaddle/Paddle/pull/69249), [#69263](https://github.com/PaddlePaddle/Paddle/pull/69263), [#69300](https://github.com/PaddlePaddle/Paddle/pull/69300), [#69313](https://github.com/PaddlePaddle/Paddle/pull/69313), [#69325](https://github.com/PaddlePaddle/Paddle/pull/69325), [#69353](https://github.com/PaddlePaddle/Paddle/pull/69353), [#69411](https://github.com/PaddlePaddle/Paddle/pull/69411), [#69506](https://github.com/PaddlePaddle/Paddle/pull/69506), [#69672](https://github.com/PaddlePaddle/Paddle/pull/69672), [#69746](https://github.com/PaddlePaddle/Paddle/pull/69746), [#69834](https://github.com/PaddlePaddle/Paddle/pull/69834), [#69836](https://github.com/PaddlePaddle/Paddle/pull/69836), [#69852](https://github.com/PaddlePaddle/Paddle/pull/69852), [#69975](https://github.com/PaddlePaddle/Paddle/pull/69975), [#70151](https://github.com/PaddlePaddle/Paddle/pull/70151), [#70293](https://github.com/PaddlePaddle/Paddle/pull/70293), [#70405](https://github.com/PaddlePaddle/Paddle/pull/70405), [#70851](https://github.com/PaddlePaddle/Paddle/pull/70851), [#71039](https://github.com/PaddlePaddle/Paddle/pull/71039), [#71254](https://github.com/PaddlePaddle/Paddle/pull/71254), [#71295](https://github.com/PaddlePaddle/Paddle/pull/71295), [#71298](https://github.com/PaddlePaddle/Paddle/pull/71298), [#71346](https://github.com/PaddlePaddle/Paddle/pull/71346), [#71377](https://github.com/PaddlePaddle/Paddle/pull/71377), [#71407](https://github.com/PaddlePaddle/Paddle/pull/71407)
-- 优化动态 shape 场景性能。 [#68491](https://github.com/PaddlePaddle/Paddle/pull/68491), [#68629](https://github.com/PaddlePaddle/Paddle/pull/68629)
-- 加速 PIR 执行器执行速度。 [#69513](https://github.com/PaddlePaddle/Paddle/pull/69513)
-- 优化 PIR 保存和加载性能。 [#69683](https://github.com/PaddlePaddle/Paddle/pull/69683)
-- 针对 device 进行优化。 [#69676](https://github.com/PaddlePaddle/Paddle/pull/69676)
-- 清理输入输出冗余信息。 [#66278](https://github.com/PaddlePaddle/Paddle/pull/66278)
+- SOT 相关:通过优化 Guard 条件机制、增强动态 shape 处理能力及新增 no_grad 支持等改进,提升了执行效率并扩展了功能特性,同时优化了代码结构与性能表现。 [#70362](https://github.com/PaddlePaddle/Paddle/pull/70362), [#70154](https://github.com/PaddlePaddle/Paddle/pull/70154), [#71748](https://github.com/PaddlePaddle/Paddle/pull/71748), [#72004](https://github.com/PaddlePaddle/Paddle/pull/72004), [#72159](https://github.com/PaddlePaddle/Paddle/pull/72159), [#72174](https://github.com/PaddlePaddle/Paddle/pull/72174), [#71994](https://github.com/PaddlePaddle/Paddle/pull/71994), [#72250](https://github.com/PaddlePaddle/Paddle/pull/72250), [#72285](https://github.com/PaddlePaddle/Paddle/pull/72285), [#72322](https://github.com/PaddlePaddle/Paddle/pull/72322), [#72272](https://github.com/PaddlePaddle/Paddle/pull/72272), [#72417](https://github.com/PaddlePaddle/Paddle/pull/72417), [#72438](https://github.com/PaddlePaddle/Paddle/pull/72438), [#72462](https://github.com/PaddlePaddle/Paddle/pull/72462), [#72463](https://github.com/PaddlePaddle/Paddle/pull/72463), [#72503](https://github.com/PaddlePaddle/Paddle/pull/72503), [#72501](https://github.com/PaddlePaddle/Paddle/pull/72501), [#72521](https://github.com/PaddlePaddle/Paddle/pull/72521), [#72509](https://github.com/PaddlePaddle/Paddle/pull/72509), [#72544](https://github.com/PaddlePaddle/Paddle/pull/72544), [#73469](https://github.com/PaddlePaddle/Paddle/pull/73469), [#73471](https://github.com/PaddlePaddle/Paddle/pull/73471), [#73555](https://github.com/PaddlePaddle/Paddle/pull/73555)
-### 废弃功能
+### 废弃
-- 移除过时的测试用例。 [#66269](https://github.com/PaddlePaddle/Paddle/pull/66269), [#66690](https://github.com/PaddlePaddle/Paddle/pull/66690), [#67505](https://github.com/PaddlePaddle/Paddle/pull/67505), [#67464](https://github.com/PaddlePaddle/Paddle/pull/67464), [#68400](https://github.com/PaddlePaddle/Paddle/pull/68400), [#68178](https://github.com/PaddlePaddle/Paddle/pull/68178), [#68194](https://github.com/PaddlePaddle/Paddle/pull/68194)
-- 清理废弃的 flag 和配置。 [#69124](https://github.com/PaddlePaddle/Paddle/pull/69124), [#69176](https://github.com/PaddlePaddle/Paddle/pull/69176), [#69274](https://github.com/PaddlePaddle/Paddle/pull/69274), [#68384](https://github.com/PaddlePaddle/Paddle/pull/68384)
-- 淘汰旧 API。 [#66032](https://github.com/PaddlePaddle/Paddle/pull/66032), [#67303](https://github.com/PaddlePaddle/Paddle/pull/67303)
-- 对 PIR 冗余策略及单测进行清理。 [#66366](https://github.com/PaddlePaddle/Paddle/pull/66366), [#70534](https://github.com/PaddlePaddle/Paddle/pull/70534), [#68444](https://github.com/PaddlePaddle/Paddle/pull/68444), [#70599](https://github.com/PaddlePaddle/Paddle/pull/70599), [#68801](https://github.com/PaddlePaddle/Paddle/pull/68801), [#66303](https://github.com/PaddlePaddle/Paddle/pull/66303), [#67854](https://github.com/PaddlePaddle/Paddle/pull/67854), [#70795](https://github.com/PaddlePaddle/Paddle/pull/70795)
-- 废弃动转静相关单测、api 等。 [#66421](https://github.com/PaddlePaddle/Paddle/pull/66421), [#68251](https://github.com/PaddlePaddle/Paddle/pull/68251), [#68252](https://github.com/PaddlePaddle/Paddle/pull/68252), [#68253](https://github.com/PaddlePaddle/Paddle/pull/68253), [#68254](https://github.com/PaddlePaddle/Paddle/pull/68254), [#68409](https://github.com/PaddlePaddle/Paddle/pull/68409), [#70569](https://github.com/PaddlePaddle/Paddle/pull/70569), [#71279](https://github.com/PaddlePaddle/Paddle/pull/71279)
-- 废弃自动并行相关单测。 [#67857](https://github.com/PaddlePaddle/Paddle/pull/67857), [#67862](https://github.com/PaddlePaddle/Paddle/pull/67862), [#67995](https://github.com/PaddlePaddle/Paddle/pull/67995), [#68012](https://github.com/PaddlePaddle/Paddle/pull/68012), [#68013](https://github.com/PaddlePaddle/Paddle/pull/68013), [#67798](https://github.com/PaddlePaddle/Paddle/pull/67798)
+- 代码清理:清理 Python 3.8 支持声明,并完成了相关代码清理、依赖精简及语法现代化更新,以优化代码维护性与兼容性。 [#71815](https://github.com/PaddlePaddle/Paddle/pull/71815), [#72802](https://github.com/PaddlePaddle/Paddle/pull/72802), [#72856](https://github.com/PaddlePaddle/Paddle/pull/72856), [#72854](https://github.com/PaddlePaddle/Paddle/pull/72854), [#72855](https://github.com/PaddlePaddle/Paddle/pull/72855), [#72873](https://github.com/PaddlePaddle/Paddle/pull/72873), [#72870](https://github.com/PaddlePaddle/Paddle/pull/72870), [#72868](https://github.com/PaddlePaddle/Paddle/pull/72868), [#72891](https://github.com/PaddlePaddle/Paddle/pull/72891)
-## 3. 编译器架构
+### 开发者相关
-CINN 编译器在完备性、性能表现等方面效果全面提升。此版本中,我们对编译器前端、后端各个环节进行了全面优化:包括新增反向计算图自动 Re-Compute 机制、前端 Pass 性能优化、符号推导机制升级、算子融合策略优化、后端 Schedule 策略和下标表达式化简能力增强等,同时排查并修复了大量正确性和性能问题,系统化的提升了编译器的通用优化能力。在飞桨 PaddleX 系列模型开启 CINN 编译器后相比动态图模式有超 60% 模型有显著性能提升。
+- 优化了 CINN 后端集成与动态 shape 处理逻辑,通过代码结构重构与测试强化提升了框架稳定性,并新增调试日志功能以增强可维护性。 [#71817](https://github.com/PaddlePaddle/Paddle/pull/71817), [#71896](https://github.com/PaddlePaddle/Paddle/pull/71896), [#71984](https://github.com/PaddlePaddle/Paddle/pull/71984), [#72067](https://github.com/PaddlePaddle/Paddle/pull/72067), [#72165](https://github.com/PaddlePaddle/Paddle/pull/72165), [#72207](https://github.com/PaddlePaddle/Paddle/pull/72207), [#72235](https://github.com/PaddlePaddle/Paddle/pull/72235), [#72273](https://github.com/PaddlePaddle/Paddle/pull/72273), [#72326](https://github.com/PaddlePaddle/Paddle/pull/72326), [#72400](https://github.com/PaddlePaddle/Paddle/pull/72400), [#72381](https://github.com/PaddlePaddle/Paddle/pull/72381), [#72560](https://github.com/PaddlePaddle/Paddle/pull/72560), [#72783](https://github.com/PaddlePaddle/Paddle/pull/72783), [#73530](https://github.com/PaddlePaddle/Paddle/pull/73530)
-### 新功能
+### 其他
-1. 新硬件后端支持:新增 HIP 和 SYCL 两种后端的支持。([#65146](https://github.com/PaddlePaddle/Paddle/pull/65146)、[#65329](https://github.com/PaddlePaddle/Paddle/pull/65329)、[#69554](https://github.com/PaddlePaddle/Paddle/pull/69554)、[#71204](https://github.com/PaddlePaddle/Paddle/pull/71204)、[#65438](https://github.com/PaddlePaddle/Paddle/pull/65438)、[#66476](https://github.com/PaddlePaddle/Paddle/pull/66476)、[#66620](https://github.com/PaddlePaddle/Paddle/pull/66620)、[#67813](https://github.com/PaddlePaddle/Paddle/pull/67813))
-2. 新增支持了推理场景下符号维度的数值范围、相等约束等信息的手工设置。([#67628](https://github.com/PaddlePaddle/Paddle/pull/67628)、[#67384](https://github.com/PaddlePaddle/Paddle/pull/67384))
+- 其他:新增 CPU 部分 kernel 对 FP16/BF16 数据类型的内核支持,优化测试模块错误处理与容差配置等。 [#71764](https://github.com/PaddlePaddle/Paddle/pull/71764), [#71951](https://github.com/PaddlePaddle/Paddle/pull/71951), [#72944](https://github.com/PaddlePaddle/Paddle/pull/72944)
-### 功能优化
+## 3. 编译器架构
-1. 优化报错信息打印,提升开发调试体验。([#67738](https://github.com/PaddlePaddle/Paddle/pull/67738)、[#68769](https://github.com/PaddlePaddle/Paddle/pull/68769)、[#71076](https://github.com/PaddlePaddle/Paddle/pull/71076))
-2. 支持 welford 算法,可以同时保证 BatchNorm 相关算子 Kenrel 的性能和精度。([#71184](https://github.com/PaddlePaddle/Paddle/pull/71184)、[#71057](https://github.com/PaddlePaddle/Paddle/pull/71057))
+优化编译器性能和增加稳定性
### 性能优化
-1. 新增了 GridReduce、Loop 合并、Transpose 调优、自动向量化等后端优化策略,显著提升了各种维度空间、不同硬件配置全场景下的 Kernel 性能。([#67236](https://github.com/PaddlePaddle/Paddle/pull/67236)、[#68897](https://github.com/PaddlePaddle/Paddle/pull/68897)、[#69409](https://github.com/PaddlePaddle/Paddle/pull/69409)、[#65336](https://github.com/PaddlePaddle/Paddle/pull/65336)、[#66419](https://github.com/PaddlePaddle/Paddle/pull/66419)、[#68338](https://github.com/PaddlePaddle/Paddle/pull/68338)、[#68364](https://github.com/PaddlePaddle/Paddle/pull/68364)、[#71087](https://github.com/PaddlePaddle/Paddle/pull/71087)、[#68019](https://github.com/PaddlePaddle/Paddle/pull/68019)、[#68122](https://github.com/PaddlePaddle/Paddle/pull/68122)、[#65187](https://github.com/PaddlePaddle/Paddle/pull/65187)、[#66742](https://github.com/PaddlePaddle/Paddle/pull/66742)、[#67083](https://github.com/PaddlePaddle/Paddle/pull/67083)、[#68667](https://github.com/PaddlePaddle/Paddle/pull/68667)、[#68750](https://github.com/PaddlePaddle/Paddle/pull/68750)、[#69376](https://github.com/PaddlePaddle/Paddle/pull/69376)、[#69350](https://github.com/PaddlePaddle/Paddle/pull/69350)、[#69740](https://github.com/PaddlePaddle/Paddle/pull/69740)、[#68918](https://github.com/PaddlePaddle/Paddle/pull/68918)、[#70092](https://github.com/PaddlePaddle/Paddle/pull/70092)、[#69607](https://github.com/PaddlePaddle/Paddle/pull/69607)、[#69794](https://github.com/PaddlePaddle/Paddle/pull/69794)、[#70258](https://github.com/PaddlePaddle/Paddle/pull/70258)、[#70547](https://github.com/PaddlePaddle/Paddle/pull/70547)、[#70581](https://github.com/PaddlePaddle/Paddle/pull/70581)、[#70649](https://github.com/PaddlePaddle/Paddle/pull/70649)、[#69732](https://github.com/PaddlePaddle/Paddle/pull/69732)、[#70786](https://github.com/PaddlePaddle/Paddle/pull/70786)、[#70942](https://github.com/PaddlePaddle/Paddle/pull/70942)、[#71014](https://github.com/PaddlePaddle/Paddle/pull/71014)、[#71263](https://github.com/PaddlePaddle/Paddle/pull/71263)、[#71249](https://github.com/PaddlePaddle/Paddle/pull/71249)、[#71340](https://github.com/PaddlePaddle/Paddle/pull/71340)、[#71301](https://github.com/PaddlePaddle/Paddle/pull/71301)、[#71380](https://github.com/PaddlePaddle/Paddle/pull/71380))
-2. 优化算子融合策略,升级了包括水平融合、多下游融合、Reshape 对齐融合等多种策略,进一步增强算子的融合能力,提升端到端优化性能。([#66034](https://github.com/PaddlePaddle/Paddle/pull/66034)、[#67829](https://github.com/PaddlePaddle/Paddle/pull/67829)、[#68171](https://github.com/PaddlePaddle/Paddle/pull/68171)、[#69478](https://github.com/PaddlePaddle/Paddle/pull/69478)、[#69691](https://github.com/PaddlePaddle/Paddle/pull/69691)、[#70665](https://github.com/PaddlePaddle/Paddle/pull/70665)、[#71103](https://github.com/PaddlePaddle/Paddle/pull/71103)、[#70873](https://github.com/PaddlePaddle/Paddle/pull/70873))
-3. 升级了后端下标表达式的化简能力,支持动静态维度的复杂表达式化简,显著降低后端生成 Kernel 的下标计算开销。([#68011](https://github.com/PaddlePaddle/Paddle/pull/68011)、[#68617](https://github.com/PaddlePaddle/Paddle/pull/68617)、[#68624](https://github.com/PaddlePaddle/Paddle/pull/68624)、[#68685](https://github.com/PaddlePaddle/Paddle/pull/68685)、[#68220](https://github.com/PaddlePaddle/Paddle/pull/68220)、[#68720](https://github.com/PaddlePaddle/Paddle/pull/68720)、[#68753](https://github.com/PaddlePaddle/Paddle/pull/68753)、[#68986](https://github.com/PaddlePaddle/Paddle/pull/68986)、[#68987](https://github.com/PaddlePaddle/Paddle/pull/68987)、[#69071](https://github.com/PaddlePaddle/Paddle/pull/69071)、[#69164](https://github.com/PaddlePaddle/Paddle/pull/69164)、 [#69282](https://github.com/PaddlePaddle/Paddle/pull/69282)、[#69522](https://github.com/PaddlePaddle/Paddle/pull/69522)、[#69857](https://github.com/PaddlePaddle/Paddle/pull/69857)、[#70208](https://github.com/PaddlePaddle/Paddle/pull/70208)、[#70355](https://github.com/PaddlePaddle/Paddle/pull/70355)、[#70427](https://github.com/PaddlePaddle/Paddle/pull/70208)、[#70450](https://github.com/PaddlePaddle/Paddle/pull/70450)、[#68737](https://github.com/PaddlePaddle/Paddle/pull/68737)、[#70500](https://github.com/PaddlePaddle/Paddle/pull/70500)、[#70953](https://github.com/PaddlePaddle/Paddle/pull/70953)、[#70933](https://github.com/PaddlePaddle/Paddle/pull/70933)、[#71026](https://github.com/PaddlePaddle/Paddle/pull/71026)、[#70456](https://github.com/PaddlePaddle/Paddle/pull/70456)、[#70257](https://github.com/PaddlePaddle/Paddle/pull/70257)、[#70461](https://github.com/PaddlePaddle/Paddle/pull/70461)、[#70142](https://github.com/PaddlePaddle/Paddle/pull/70142)、[#71018](https://github.com/PaddlePaddle/Paddle/pull/71018)、[#71278](https://github.com/PaddlePaddle/Paddle/pull/71278))
-4. 新增了反向计算图自动 Re-Compute 机制,可有效降低模型训练显存并提升性能。([#69342](https://github.com/PaddlePaddle/Paddle/pull/69342)、[#70255](https://github.com/PaddlePaddle/Paddle/pull/70255)、[#68241](https://github.com/PaddlePaddle/Paddle/pull/68241)、[#69954](https://github.com/PaddlePaddle/Paddle/pull/69954)、[#70832](https://github.com/PaddlePaddle/Paddle/pull/70832))
-5. 优化后端 Host、Device 代码编译流程,降低编译耗时,同时提升 Broadcast 场景下分支的处理性能。([#65669](https://github.com/PaddlePaddle/Paddle/pull/65669)、[#65916](https://github.com/PaddlePaddle/Paddle/pull/65916)、[#66109](https://github.com/PaddlePaddle/Paddle/pull/66109)、[#65611](https://github.com/PaddlePaddle/Paddle/pull/65611)、[#65990](https://github.com/PaddlePaddle/Paddle/pull/65990)、[#66088](https://github.com/PaddlePaddle/Paddle/pull/66088)、[#66207](https://github.com/PaddlePaddle/Paddle/pull/66207)、[#66537](https://github.com/PaddlePaddle/Paddle/pull/66537)、[#66768](https://github.com/PaddlePaddle/Paddle/pull/66768)、[#70685](https://github.com/PaddlePaddle/Paddle/pull/70685)、[#71410](https://github.com/PaddlePaddle/Paddle/pull/71410)、[#66062](https://github.com/PaddlePaddle/Paddle/pull/66062))
-6. 完善升级了动态维度的符号推导、化简、缓存等机制,添加了所有常规算子(580+)的符号推导接口实现,为 Kernel 编译提供更多约束信息。([#65343](https://github.com/PaddlePaddle/Paddle/pull/65343)、[#66582](https://github.com/PaddlePaddle/Paddle/pull/66582)、[#65500](https://github.com/PaddlePaddle/Paddle/pull/65500)、[#65591](https://github.com/PaddlePaddle/Paddle/pull/65591)、[#66637](https://github.com/PaddlePaddle/Paddle/pull/66637)、[#68208](https://github.com/PaddlePaddle/Paddle/pull/68208)、[#68056](https://github.com/PaddlePaddle/Paddle/pull/68056)、[#68015](https://github.com/PaddlePaddle/Paddle/pull/68015)、[#68096](https://github.com/PaddlePaddle/Paddle/pull/68096)、[#68236](https://github.com/PaddlePaddle/Paddle/pull/68236)、[#68973](https://github.com/PaddlePaddle/Paddle/pull/68973)、[#68967](https://github.com/PaddlePaddle/Paddle/pull/68967)、[#69133](https://github.com/PaddlePaddle/Paddle/pull/69133)、[#68550](https://github.com/PaddlePaddle/Paddle/pull/68550)、[#68882](https://github.com/PaddlePaddle/Paddle/pull/68882)、[#69005](https://github.com/PaddlePaddle/Paddle/pull/69005)、[#69911](https://github.com/PaddlePaddle/Paddle/pull/69911)、[#70376](https://github.com/PaddlePaddle/Paddle/pull/70376)、[#71153](https://github.com/PaddlePaddle/Paddle/pull/71153)、[#66644](https://github.com/PaddlePaddle/Paddle/pull/66644)、[#66650](https://github.com/PaddlePaddle/Paddle/pull/66650)、[#66642](https://github.com/PaddlePaddle/Paddle/pull/66642)、[#66729](https://github.com/PaddlePaddle/Paddle/pull/66729)、[#66838](https://github.com/PaddlePaddle/Paddle/pull/66838)、[#66762](https://github.com/PaddlePaddle/Paddle/pull/66762)、[#66580](https://github.com/PaddlePaddle/Paddle/pull/66580)、[#66612](https://github.com/PaddlePaddle/Paddle/pull/66612)、[#66625](https://github.com/PaddlePaddle/Paddle/pull/66625)、[#66643](https://github.com/PaddlePaddle/Paddle/pull/66643)、[#66837](https://github.com/PaddlePaddle/Paddle/pull/66837)、[#66946](https://github.com/PaddlePaddle/Paddle/pull/66946)、[#67018](https://github.com/PaddlePaddle/Paddle/pull/67018)、[#67049](https://github.com/PaddlePaddle/Paddle/pull/67049)、[#66956](https://github.com/PaddlePaddle/Paddle/pull/66956)、[#67008](https://github.com/PaddlePaddle/Paddle/pull/67008)、[#66930](https://github.com/PaddlePaddle/Paddle/pull/66930)、[#66877](https://github.com/PaddlePaddle/Paddle/pull/66877)、[#66896](https://github.com/PaddlePaddle/Paddle/pull/66896)、[#67120](https://github.com/PaddlePaddle/Paddle/pull/67120)、[#67117](https://github.com/PaddlePaddle/Paddle/pull/67117)、[#67098](https://github.com/PaddlePaddle/Paddle/pull/67098)、[#67136](https://github.com/PaddlePaddle/Paddle/pull/67136)、[#67294](https://github.com/PaddlePaddle/Paddle/pull/67294)、[#67327](https://github.com/PaddlePaddle/Paddle/pull/67327)、[#66827](https://github.com/PaddlePaddle/Paddle/pull/66827)、[#67201](https://github.com/PaddlePaddle/Paddle/pull/67201)、[#66892](https://github.com/PaddlePaddle/Paddle/pull/66892)、[#67377](https://github.com/PaddlePaddle/Paddle/pull/67377)、[#66619](https://github.com/PaddlePaddle/Paddle/pull/66619)、[#67037](https://github.com/PaddlePaddle/Paddle/pull/67037)、[#67412](https://github.com/PaddlePaddle/Paddle/pull/67412)、[#67394](https://github.com/PaddlePaddle/Paddle/pull/67394)、[#67374](https://github.com/PaddlePaddle/Paddle/pull/67374)、[#67418](https://github.com/PaddlePaddle/Paddle/pull/67418)、[#67348](https://github.com/PaddlePaddle/Paddle/pull/67348)、[#67337](https://github.com/PaddlePaddle/Paddle/pull/67337)、[#67390](https://github.com/PaddlePaddle/Paddle/pull/67390)、[#67407](https://github.com/PaddlePaddle/Paddle/pull/67407)、[#67491](https://github.com/PaddlePaddle/Paddle/pull/67491)、[#67422](https://github.com/PaddlePaddle/Paddle/pull/67422)、[#67461](https://github.com/PaddlePaddle/Paddle/pull/67461)、[#67458](https://github.com/PaddlePaddle/Paddle/pull/67458)、[#67486](https://github.com/PaddlePaddle/Paddle/pull/67486)、[#67490](https://github.com/PaddlePaddle/Paddle/pull/67490)、[#67462](https://github.com/PaddlePaddle/Paddle/pull/67462)、[#67364](https://github.com/PaddlePaddle/Paddle/pull/67364)、[#67435](https://github.com/PaddlePaddle/Paddle/pull/67435)、[#67665](https://github.com/PaddlePaddle/Paddle/pull/67665)、[#67426](https://github.com/PaddlePaddle/Paddle/pull/67426)、[#67507](https://github.com/PaddlePaddle/Paddle/pull/67507)、[#67730](https://github.com/PaddlePaddle/Paddle/pull/67730)、[#67776](https://github.com/PaddlePaddle/Paddle/pull/67776)、[#67806](https://github.com/PaddlePaddle/Paddle/pull/67806)、[#67803](https://github.com/PaddlePaddle/Paddle/pull/67803)、[#67788](https://github.com/PaddlePaddle/Paddle/pull/67788)、[#67705](https://github.com/PaddlePaddle/Paddle/pull/67705)、[#67814](https://github.com/PaddlePaddle/Paddle/pull/67814)、[#67858](https://github.com/PaddlePaddle/Paddle/pull/67858)、[#67751](https://github.com/PaddlePaddle/Paddle/pull/67751)、[#67875](https://github.com/PaddlePaddle/Paddle/pull/67875)、[#67663](https://github.com/PaddlePaddle/Paddle/pull/67663)、[#67434](https://github.com/PaddlePaddle/Paddle/pull/67434)、[#67818](https://github.com/PaddlePaddle/Paddle/pull/67818)、[#68180](https://github.com/PaddlePaddle/Paddle/pull/68180)、[#68547](https://github.com/PaddlePaddle/Paddle/pull/68547)、[#68548](https://github.com/PaddlePaddle/Paddle/pull/68548)、[#68670](https://github.com/PaddlePaddle/Paddle/pull/68670)、[#68964](https://github.com/PaddlePaddle/Paddle/pull/68964)、[#68929](https://github.com/PaddlePaddle/Paddle/pull/68929)、[#68907](https://github.com/PaddlePaddle/Paddle/pull/68907)、[#68917](https://github.com/PaddlePaddle/Paddle/pull/68917)、[#68984](https://github.com/PaddlePaddle/Paddle/pull/68984)、[#68644](https://github.com/PaddlePaddle/Paddle/pull/68644)、[#69167](https://github.com/PaddlePaddle/Paddle/pull/69167)、[#68975](https://github.com/PaddlePaddle/Paddle/pull/68975)、[#68947](https://github.com/PaddlePaddle/Paddle/pull/68947)、[#68978](https://github.com/PaddlePaddle/Paddle/pull/68978)、[#68980](https://github.com/PaddlePaddle/Paddle/pull/68980)、[#68979](https://github.com/PaddlePaddle/Paddle/pull/68979)、[#69329](https://github.com/PaddlePaddle/Paddle/pull/69329)、[#69055](https://github.com/PaddlePaddle/Paddle/pull/69055)、[#69331](https://github.com/PaddlePaddle/Paddle/pull/69331)、[#69414](https://github.com/PaddlePaddle/Paddle/pull/69414)、[#69335](https://github.com/PaddlePaddle/Paddle/pull/69335)、[#69017](https://github.com/PaddlePaddle/Paddle/pull/69017)、[#69344](https://github.com/PaddlePaddle/Paddle/pull/69344)、[#69069](https://github.com/PaddlePaddle/Paddle/pull/69069)、[#69698](https://github.com/PaddlePaddle/Paddle/pull/69698)、[#69919](https://github.com/PaddlePaddle/Paddle/pull/69919)、[#69964](https://github.com/PaddlePaddle/Paddle/pull/69964)、[#70337](https://github.com/PaddlePaddle/Paddle/pull/70337)、[#70282](https://github.com/PaddlePaddle/Paddle/pull/70282)、[#70741](https://github.com/PaddlePaddle/Paddle/pull/70741)、[#70818](https://github.com/PaddlePaddle/Paddle/pull/70818)、[#71031](https://github.com/PaddlePaddle/Paddle/pull/71031)、[#70541](https://github.com/PaddlePaddle/Paddle/pull/70541)、[#66609](https://github.com/PaddlePaddle/Paddle/pull/66609)、[#66889](https://github.com/PaddlePaddle/Paddle/pull/66889)、[#66633](https://github.com/PaddlePaddle/Paddle/pull/66633)、[#66735](https://github.com/PaddlePaddle/Paddle/pull/66735)、[#66935](https://github.com/PaddlePaddle/Paddle/pull/66935)、[#66627](https://github.com/PaddlePaddle/Paddle/pull/66627)、[#66730](https://github.com/PaddlePaddle/Paddle/pull/66730)、[#67210](https://github.com/PaddlePaddle/Paddle/pull/67210)、[#67115](https://github.com/PaddlePaddle/Paddle/pull/67115)、[#67275](https://github.com/PaddlePaddle/Paddle/pull/67275)、[#67472](https://github.com/PaddlePaddle/Paddle/pull/67472)、[#67577](https://github.com/PaddlePaddle/Paddle/pull/67577)、[#67328](https://github.com/PaddlePaddle/Paddle/pull/67328)、[#67566](https://github.com/PaddlePaddle/Paddle/pull/67566)、[#67451](https://github.com/PaddlePaddle/Paddle/pull/67451)、[#68098](https://github.com/PaddlePaddle/Paddle/pull/68098)、[#68225](https://github.com/PaddlePaddle/Paddle/pull/68225)、[#68177](https://github.com/PaddlePaddle/Paddle/pull/68177)、[#68102](https://github.com/PaddlePaddle/Paddle/pull/68102)、[#67951](https://github.com/PaddlePaddle/Paddle/pull/67951)、[#67957](https://github.com/PaddlePaddle/Paddle/pull/67957)、[#68235](https://github.com/PaddlePaddle/Paddle/pull/68235)、[#68447](https://github.com/PaddlePaddle/Paddle/pull/68447)、[#68446](https://github.com/PaddlePaddle/Paddle/pull/68446)、[#68183](https://github.com/PaddlePaddle/Paddle/pull/68183)、[#68318](https://github.com/PaddlePaddle/Paddle/pull/68318)、[#68385](https://github.com/PaddlePaddle/Paddle/pull/68385)、[#67635](https://github.com/PaddlePaddle/Paddle/pull/67635)、[#65623](https://github.com/PaddlePaddle/Paddle/pull/65623)、[#65956](https://github.com/PaddlePaddle/Paddle/pull/65956)、[#66063](https://github.com/PaddlePaddle/Paddle/pull/66063)、[#65992](https://github.com/PaddlePaddle/Paddle/pull/65992)、[#65880](https://github.com/PaddlePaddle/Paddle/pull/65880)、[#66343](https://github.com/PaddlePaddle/Paddle/pull/66343)、[#65889](https://github.com/PaddlePaddle/Paddle/pull/65889)、[#66606](https://github.com/PaddlePaddle/Paddle/pull/66606)、[#66618](https://github.com/PaddlePaddle/Paddle/pull/66618)、[#66737](https://github.com/PaddlePaddle/Paddle/pull/66737)、[#66607](https://github.com/PaddlePaddle/Paddle/pull/66607)、[#66579](https://github.com/PaddlePaddle/Paddle/pull/66579)、[#66732](https://github.com/PaddlePaddle/Paddle/pull/66732)、[#66849](https://github.com/PaddlePaddle/Paddle/pull/66849)、[#66400](https://github.com/PaddlePaddle/Paddle/pull/66400)、[#66952](https://github.com/PaddlePaddle/Paddle/pull/66952)、[#66570](https://github.com/PaddlePaddle/Paddle/pull/66570)、[#66967](https://github.com/PaddlePaddle/Paddle/pull/66967)、[#66595](https://github.com/PaddlePaddle/Paddle/pull/66595)、[#67121](https://github.com/PaddlePaddle/Paddle/pull/67121)、[#67206](https://github.com/PaddlePaddle/Paddle/pull/67206)、[#67444](https://github.com/PaddlePaddle/Paddle/pull/67444)、[#67494](https://github.com/PaddlePaddle/Paddle/pull/67494)、[#67499](https://github.com/PaddlePaddle/Paddle/pull/67499)、[#67267](https://github.com/PaddlePaddle/Paddle/pull/67267)、[#67567](https://github.com/PaddlePaddle/Paddle/pull/67567)、[#67455](https://github.com/PaddlePaddle/Paddle/pull/67455)、[#67161](https://github.com/PaddlePaddle/Paddle/pull/67161)、[#67581](https://github.com/PaddlePaddle/Paddle/pull/67581)、[#67539](https://github.com/PaddlePaddle/Paddle/pull/67539)、[#67625](https://github.com/PaddlePaddle/Paddle/pull/67625)、[#67690](https://github.com/PaddlePaddle/Paddle/pull/67690)、[#67454](https://github.com/PaddlePaddle/Paddle/pull/67454)、[#67731](https://github.com/PaddlePaddle/Paddle/pull/67731)、[#67734](https://github.com/PaddlePaddle/Paddle/pull/67734)、[#67735](https://github.com/PaddlePaddle/Paddle/pull/67735)、[#67607](https://github.com/PaddlePaddle/Paddle/pull/67607)、[#67413](https://github.com/PaddlePaddle/Paddle/pull/67413)、[#67387](https://github.com/PaddlePaddle/Paddle/pull/67387)、[#67882](https://github.com/PaddlePaddle/Paddle/pull/67882)、[#67864](https://github.com/PaddlePaddle/Paddle/pull/67864)、[#67503](https://github.com/PaddlePaddle/Paddle/pull/67503)、[#67861](https://github.com/PaddlePaddle/Paddle/pull/67861)、[#67888](https://github.com/PaddlePaddle/Paddle/pull/67888)、[#67884](https://github.com/PaddlePaddle/Paddle/pull/67884)、[#67826](https://github.com/PaddlePaddle/Paddle/pull/67826)、[#68044](https://github.com/PaddlePaddle/Paddle/pull/68044)、[#67851](https://github.com/PaddlePaddle/Paddle/pull/67851)、[#68276](https://github.com/PaddlePaddle/Paddle/pull/68276)、[#69888](https://github.com/PaddlePaddle/Paddle/pull/69888)、[#70093](https://github.com/PaddlePaddle/Paddle/pull/70093)、[#70436](https://github.com/PaddlePaddle/Paddle/pull/70436)、[#70914](https://github.com/PaddlePaddle/Paddle/pull/70914)、[#71222](https://github.com/PaddlePaddle/Paddle/pull/71222))
-7. 优化了部分前端 Pass,提高前端处理流程的鲁棒性,提升计算密集型的子图性能。 ([#65142](https://github.com/PaddlePaddle/Paddle/pull/65142)、[#67466](https://github.com/PaddlePaddle/Paddle/pull/67466)、[#69228](https://github.com/PaddlePaddle/Paddle/pull/69228)、[#70994](https://github.com/PaddlePaddle/Paddle/pull/70994)、[#71226](https://github.com/PaddlePaddle/Paddle/pull/71226)、[#71297](https://github.com/PaddlePaddle/Paddle/pull/71297)、[#71443](https://github.com/PaddlePaddle/Paddle/pull/71443))
-8. 设计了新的后端 IR 基础组件和相关 Pass 接口,提供更加简洁高效的优化策略开发方式,通过自动剪枝策略同时可有效降低后端 IR 的遍历开销。([#70485](https://github.com/PaddlePaddle/Paddle/pull/70485)、[#70765](https://github.com/PaddlePaddle/Paddle/pull/70765)、[#71042](https://github.com/PaddlePaddle/Paddle/pull/71042)、[#70952](https://github.com/PaddlePaddle/Paddle/pull/70952)、[#69454](https://github.com/PaddlePaddle/Paddle/pull/69454)、[#70361](https://github.com/PaddlePaddle/Paddle/pull/70361)、[#70334](https://github.com/PaddlePaddle/Paddle/pull/70334)、[#70406](https://github.com/PaddlePaddle/Paddle/pull/70406)、 [#70191](https://github.com/PaddlePaddle/Paddle/pull/70191)、[#70462](https://github.com/PaddlePaddle/Paddle/pull/70462)、[#70548](https://github.com/PaddlePaddle/Paddle/pull/70548)、[#70592](https://github.com/PaddlePaddle/Paddle/pull/70592)、[#70437](https://github.com/PaddlePaddle/Paddle/pull/70437)、[#70619](https://github.com/PaddlePaddle/Paddle/pull/70619)、[#70543](https://github.com/PaddlePaddle/Paddle/pull/70543)、[#69611](https://github.com/PaddlePaddle/Paddle/pull/69611)、[#70739](https://github.com/PaddlePaddle/Paddle/pull/70739)、[#70533](https://github.com/PaddlePaddle/Paddle/pull/70533)、[#70696](https://github.com/PaddlePaddle/Paddle/pull/70696)、[#70498](https://github.com/PaddlePaddle/Paddle/pull/70498)、[#70829](https://github.com/PaddlePaddle/Paddle/pull/70829)、[#71111](https://github.com/PaddlePaddle/Paddle/pull/71111)、[#70883](https://github.com/PaddlePaddle/Paddle/pull/70883))
+- 支持训练场景的 Layout 自动转换优化。([#71891](https://github.com/PaddlePaddle/Paddle/pull/71891))
+- 后端新增了 argmin、argmax、arange 等算子的 Kernel 编译优化。([#71956](https://github.com/PaddlePaddle/Paddle/pull/71956), [#72598](https://github.com/PaddlePaddle/Paddle/pull/72598)))
+- 支持矩阵乘的融合优化。([#72846](https://github.com/PaddlePaddle/Paddle/pull/72846))
+- 优化部分算子 Kernel 计算性能。([#72871](https://github.com/PaddlePaddle/Paddle/pull/72871))
### Bug 修复
-1. 修复部分算子符号推导实现逻辑的 Bug。([#65185](https://github.com/PaddlePaddle/Paddle/pull/65185)、[#65231](https://github.com/PaddlePaddle/Paddle/pull/65231)、[#65266](https://github.com/PaddlePaddle/Paddle/pull/65266)、[#65951](https://github.com/PaddlePaddle/Paddle/pull/65951)、[#67142](https://github.com/PaddlePaddle/Paddle/pull/67142)、[#67286](https://github.com/PaddlePaddle/Paddle/pull/67286)、[#65958](https://github.com/PaddlePaddle/Paddle/pull/65958)、[#65955](https://github.com/PaddlePaddle/Paddle/pull/65955)、[#66470](https://github.com/PaddlePaddle/Paddle/pull/66470)、[#66764](https://github.com/PaddlePaddle/Paddle/pull/66764)、[#66036](https://github.com/PaddlePaddle/Paddle/pull/66036)、[#66662](https://github.com/PaddlePaddle/Paddle/pull/66662)、[#66741](https://github.com/PaddlePaddle/Paddle/pull/66741)、[#66745](https://github.com/PaddlePaddle/Paddle/pull/66745)、[#66807](https://github.com/PaddlePaddle/Paddle/pull/66807)、[#66791](https://github.com/PaddlePaddle/Paddle/pull/66791)、[#66859](https://github.com/PaddlePaddle/Paddle/pull/66859)、[#66880](https://github.com/PaddlePaddle/Paddle/pull/66880)、[#66962](https://github.com/PaddlePaddle/Paddle/pull/66962))
-2. 修复部分特殊算子 Lowering 到编译器时的 Bug。([#68698](https://github.com/PaddlePaddle/Paddle/pull/68698)、[#68699](https://github.com/PaddlePaddle/Paddle/pull/68699)、 [#68691](https://github.com/PaddlePaddle/Paddle/pull/68691)、[#68948](https://github.com/PaddlePaddle/Paddle/pull/68948)、[#70144](https://github.com/PaddlePaddle/Paddle/pull/70144)、[#70895](https://github.com/PaddlePaddle/Paddle/pull/70895))
-3. 修复算子融合在部分场景报错的问题。([#67038](https://github.com/PaddlePaddle/Paddle/pull/67038)、[#67400](https://github.com/PaddlePaddle/Paddle/pull/67400)、[#67655](https://github.com/PaddlePaddle/Paddle/pull/67655)、[#67723](https://github.com/PaddlePaddle/Paddle/pull/67723)、[#68029](https://github.com/PaddlePaddle/Paddle/pull/68029)、[#68042](https://github.com/PaddlePaddle/Paddle/pull/68042)、[#68888](https://github.com/PaddlePaddle/Paddle/pull/68888)、[#69250](https://github.com/PaddlePaddle/Paddle/pull/69250)、[#69937](https://github.com/PaddlePaddle/Paddle/pull/69937)、[#70924](https://github.com/PaddlePaddle/Paddle/pull/70924))
-4. 修复后端在处理极端值时的正确性问题,提高编译器的鲁棒性。([#68327](https://github.com/PaddlePaddle/Paddle/pull/68327))
-5. 修复后端 Schedule 和 后处理调优过程的实现逻辑 Bug,解决部分 case 下的报错和性能问题。([#68605](https://github.com/PaddlePaddle/Paddle/pull/68605)、[#68937](https://github.com/PaddlePaddle/Paddle/pull/68937)、[#68587](https://github.com/PaddlePaddle/Paddle/pull/68587)、[#69060](https://github.com/PaddlePaddle/Paddle/pull/69060)、[#69608](https://github.com/PaddlePaddle/Paddle/pull/69608)、[#71471](https://github.com/PaddlePaddle/Paddle/pull/71471)、[#71068](https://github.com/PaddlePaddle/Paddle/pull/71068))
-6. 解决了算子融合过程中的存在随机性的问题。([#69547](https://github.com/PaddlePaddle/Paddle/pull/69547)、[#70931](https://github.com/PaddlePaddle/Paddle/pull/70931))
+修复各类场景下的一些处理逻辑 Bug。([#71813](https://github.com/PaddlePaddle/Paddle/pull/71813), [#71886](https://github.com/PaddlePaddle/Paddle/pull/71886), [#71927](https://github.com/PaddlePaddle/Paddle/pull/71927), [#71915](https://github.com/PaddlePaddle/Paddle/pull/71915), [#71946](https://github.com/PaddlePaddle/Paddle/pull/71946), [#71949](https://github.com/PaddlePaddle/Paddle/pull/71949), [#71955](https://github.com/PaddlePaddle/Paddle/pull/71955), [#71942](https://github.com/PaddlePaddle/Paddle/pull/71942), [#71939](https://github.com/PaddlePaddle/Paddle/pull/71939), [#71973](https://github.com/PaddlePaddle/Paddle/pull/71973), [#72001](https://github.com/PaddlePaddle/Paddle/pull/72001), [#72020](https://github.com/PaddlePaddle/Paddle/pull/72020), [#72014](https://github.com/PaddlePaddle/Paddle/pull/72014), [#72021](https://github.com/PaddlePaddle/Paddle/pull/72021), [#72027](https://github.com/PaddlePaddle/Paddle/pull/72027), [#72061](https://github.com/PaddlePaddle/Paddle/pull/72061), [#72025](https://github.com/PaddlePaddle/Paddle/pull/72025), [#72095](https://github.com/PaddlePaddle/Paddle/pull/72095), [#72108](https://github.com/PaddlePaddle/Paddle/pull/72108), [#72132](https://github.com/PaddlePaddle/Paddle/pull/72132), [#71985](https://github.com/PaddlePaddle/Paddle/pull/71985), [#72106](https://github.com/PaddlePaddle/Paddle/pull/72106), [#72140](https://github.com/PaddlePaddle/Paddle/pull/72140), [#72167](https://github.com/PaddlePaddle/Paddle/pull/72167), [#72037](https://github.com/PaddlePaddle/Paddle/pull/72037), [#72178](https://github.com/PaddlePaddle/Paddle/pull/72178), [#72143](https://github.com/PaddlePaddle/Paddle/pull/72143), [#72175](https://github.com/PaddlePaddle/Paddle/pull/72175), [#72191](https://github.com/PaddlePaddle/Paddle/pull/72191), [#72213](https://github.com/PaddlePaddle/Paddle/pull/72213), [#72189](https://github.com/PaddlePaddle/Paddle/pull/72189), [#72214](https://github.com/PaddlePaddle/Paddle/pull/72214), [#72166](https://github.com/PaddlePaddle/Paddle/pull/72166), [#72180](https://github.com/PaddlePaddle/Paddle/pull/72180), [#72284](https://github.com/PaddlePaddle/Paddle/pull/72284), [#72267](https://github.com/PaddlePaddle/Paddle/pull/72267), [#72348](https://github.com/PaddlePaddle/Paddle/pull/72348), [#72332](https://github.com/PaddlePaddle/Paddle/pull/72332), [#72307](https://github.com/PaddlePaddle/Paddle/pull/72307), [#72353](https://github.com/PaddlePaddle/Paddle/pull/72353), [#72204](https://github.com/PaddlePaddle/Paddle/pull/72204), [#72457](https://github.com/PaddlePaddle/Paddle/pull/72457), [#72426](https://github.com/PaddlePaddle/Paddle/pull/72426), [#72536](https://github.com/PaddlePaddle/Paddle/pull/72536), [#72541](https://github.com/PaddlePaddle/Paddle/pull/72541), [#72365](https://github.com/PaddlePaddle/Paddle/pull/72365), [#72621](https://github.com/PaddlePaddle/Paddle/pull/72621), [#72630](https://github.com/PaddlePaddle/Paddle/pull/72630), [#72669](https://github.com/PaddlePaddle/Paddle/pull/72669), [#72682](https://github.com/PaddlePaddle/Paddle/pull/72682), [#72732](https://github.com/PaddlePaddle/Paddle/pull/72732), [#72811](https://github.com/PaddlePaddle/Paddle/pull/72811), [#72941](https://github.com/PaddlePaddle/Paddle/pull/72941), [#72795](https://github.com/PaddlePaddle/Paddle/pull/72795), [#73536](https://github.com/PaddlePaddle/Paddle/pull/73536))
## 4. 自动并行架构
-在 3.0 正式版中,我们对自动并行架构进行了深入的验证和打磨,以更好地支持纯文稠密模型、纯文稀疏模型(MoE)和多模态理解模型等常见大模型场景的预训练+精调流程。具体而言,我们针对这些场景新增了 20+算子的切分推导规则,并支持将自动并行训练参数转化成手动并行参数进行下游推理,使自动并行达到了全面可用的状态,帮助用户降低大模型并行程序的开发成本。同时,为了进一步简化用户的分布式开发流程,我们推出了一个新的`paddle.distributed.parallel`接口,基于对分布式张量标记语法的封装,支持用户在模型组网外不侵入地配置数据并行、模型并行、流水并行等常见的并行策略。此外,静态图自动并行架构基于 PIR 完成了全面的升级,底层的基础组件、核心模块、并行策略和性能优化策略均统一基于扩展的 PIR `DistDialect`进行实现,进一步增强了自动并行的动静一致性,并在 Llama 系列模型上性能达到了持平甚至领先手动并行方式的水平。
-
-### 新特性
-
-- 新增`paddle.distributed.parallel`接口,支持在模型组网外配置常见并行策略,简化分布式开发流程。[#69004](https://github.com/PaddlePaddle/Paddle/pull/69004), [#69033](https://github.com/PaddlePaddle/Paddle/pull/69033), [#69077](https://github.com/PaddlePaddle/Paddle/pull/69077), [#69136](https://github.com/PaddlePaddle/Paddle/pull/69136), [#69169](https://github.com/PaddlePaddle/Paddle/pull/69169), [#69212](https://github.com/PaddlePaddle/Paddle/pull/69212), [#69217](https://github.com/PaddlePaddle/Paddle/pull/69217), [#69283](https://github.com/PaddlePaddle/Paddle/pull/69283), [#69288](https://github.com/PaddlePaddle/Paddle/pull/69288), [#69326](https://github.com/PaddlePaddle/Paddle/pull/69326), [#69365](https://github.com/PaddlePaddle/Paddle/pull/69365), [#69384](https://github.com/PaddlePaddle/Paddle/pull/69384), [#69426](https://github.com/PaddlePaddle/Paddle/pull/69426), [#69443](https://github.com/PaddlePaddle/Paddle/pull/69443), [#69462](https://github.com/PaddlePaddle/Paddle/pull/69462), [#69492](https://github.com/PaddlePaddle/Paddle/pull/69492), [#69628](https://github.com/PaddlePaddle/Paddle/pull/69628), [#69677](https://github.com/PaddlePaddle/Paddle/pull/69677), [#69697](https://github.com/PaddlePaddle/Paddle/pull/69697), [#69776](https://github.com/PaddlePaddle/Paddle/pull/69776), [#69896](https://github.com/PaddlePaddle/Paddle/pull/69896), [#70138](https://github.com/PaddlePaddle/Paddle/pull/70138), [#70182](https://github.com/PaddlePaddle/Paddle/pull/70182), [#70539](https://github.com/PaddlePaddle/Paddle/pull/70539), [#71116](https://github.com/PaddlePaddle/Paddle/pull/71116), [#71210](https://github.com/PaddlePaddle/Paddle/pull/71210)
-- 面向纯文稀疏场景支持 MoE 专家并行,实现专家并行变 mesh 切分转换机制并支持自动调用 all2all 通信。[#66462](https://github.com/PaddlePaddle/Paddle/pull/66462), [#66750](https://github.com/PaddlePaddle/Paddle/pull/66750), [#68004](https://github.com/PaddlePaddle/Paddle/pull/68004), [#68053](https://github.com/PaddlePaddle/Paddle/pull/68053), [#68187](https://github.com/PaddlePaddle/Paddle/pull/68187), [#68477](https://github.com/PaddlePaddle/Paddle/pull/68477), [#69098](https://github.com/PaddlePaddle/Paddle/pull/69098), [#69262](https://github.com/PaddlePaddle/Paddle/pull/69262), [#69296](https://github.com/PaddlePaddle/Paddle/pull/69296), [#70715](https://github.com/PaddlePaddle/Paddle/pull/70715), [#71292](https://github.com/PaddlePaddle/Paddle/pull/71292), [#71320](https://github.com/PaddlePaddle/Paddle/pull/71320)
-- 为了满足极致手工优化场景下用户自行管理切分状态和通信操作的需求,同时解决部分非 SPMD 场景下无法使用张量切分语法的问题,我们新增了`LocalLayer`接口,支持自动并行和手动并行混合组网。[#70519](https://github.com/PaddlePaddle/Paddle/pull/70519), [#70525](https://github.com/PaddlePaddle/Paddle/pull/70525), [#70600](https://github.com/PaddlePaddle/Paddle/pull/70600), [#71232](https://github.com/PaddlePaddle/Paddle/pull/71232), [#71264](https://github.com/PaddlePaddle/Paddle/pull/71264), [#71373](https://github.com/PaddlePaddle/Paddle/pull/71373)
-- 为了让用户可以使用国产硬件运行自动并行程序,完成了对昆仑芯片的适配,其它芯片的支持也在进行中。[#70997](https://github.com/PaddlePaddle/Paddle/pull/70997), [#71126](https://github.com/PaddlePaddle/Paddle/pull/71126), [#71229](https://github.com/PaddlePaddle/Paddle/pull/71229), [#71289](https://github.com/PaddlePaddle/Paddle/pull/71289), [#71425](https://github.com/PaddlePaddle/Paddle/pull/71425), [#71500](https://github.com/PaddlePaddle/Paddle/pull/71500)
-- 针对数据维度无法整除设备维度的情况,支持了非均衡的切分推导和切分转换。[#66103](https://github.com/PaddlePaddle/Paddle/pull/66103), [#67756](https://github.com/PaddlePaddle/Paddle/pull/67756), [#69265](https://github.com/PaddlePaddle/Paddle/pull/69265), [#70072](https://github.com/PaddlePaddle/Paddle/pull/70072)
-- 对 shard_dataloader 功能进行了升级,支持通过`batch_sampler`设置梯度累加步数,同时支持模型多输入的场景。[#65325](https://github.com/PaddlePaddle/Paddle/pull/65325), [#70659](https://github.com/PaddlePaddle/Paddle/pull/70659)
-- 对参数保存和加载功能进行了升级,支持参数异步存储、支持动态图和静态图互相加载`master_weight`、同时支持参数版本控制和 offload 功能。[#66858](https://github.com/PaddlePaddle/Paddle/pull/66858), [#67427](https://github.com/PaddlePaddle/Paddle/pull/67427), [#70105](https://github.com/PaddlePaddle/Paddle/pull/70105), [#70639](https://github.com/PaddlePaddle/Paddle/pull/70639)
-- 为了满足用户对含有`PyLayer`的组网进行动转静的需求,在静态图模式下对`PyLayer`进行了支持,允许在`PyLayer`内部运行分布式张量。[#67326](https://github.com/PaddlePaddle/Paddle/pull/67326), [#68190](https://github.com/PaddlePaddle/Paddle/pull/68190), [#69089](https://github.com/PaddlePaddle/Paddle/pull/69089), [#70831](https://github.com/PaddlePaddle/Paddle/pull/70831)
-- 为了解决数据流输入格式与模型动转静实际需要的`input_spec`不一致导致无法正确动转静的问题,对动转静接口支持了用户自定义`input_spec`功能,允许用户自行传入需要的`input_spec`。[#69183](https://github.com/PaddlePaddle/Paddle/pull/69183)
-- 针对混合并行场景,对梯度裁剪策略进行了适配和支持。[#65259](https://github.com/PaddlePaddle/Paddle/pull/65259), [#65928](https://github.com/PaddlePaddle/Paddle/pull/65928), [#69287](https://github.com/PaddlePaddle/Paddle/pull/69287), [#69760](https://github.com/PaddlePaddle/Paddle/pull/69760), [#71421](https://github.com/PaddlePaddle/Paddle/pull/71421)
-- 针对模型层数不整除设备数的场景,支持非均衡流水并行策略,允许用户在不同流水阶段切分数量不同的网络层。[#69728](https://github.com/PaddlePaddle/Paddle/pull/69728), [#70164](https://github.com/PaddlePaddle/Paddle/pull/70164), [#70230](https://github.com/PaddlePaddle/Paddle/pull/70230)
-- 新增`set_mesh`和`get_mesh`接口,支持用户方便地设置和获取全局 mesh。[#69999](https://github.com/PaddlePaddle/Paddle/pull/69999)
-- 新增自动并行和手动并行精度对齐开关,方便将已有的手动并行模型改写成自动并行后验证精度正确性。[#67681](https://github.com/PaddlePaddle/Paddle/pull/67681)
+在 3.1 版本中,我们对自动并行架构进一步打磨,以提高自动并行易用性和动态图性能。具体地,我们完善了自动并行核心机制,包括新增了多个算子的切分推导规则,支持分布式张量的同一维度被多个 mesh 维度切分,支持动态图并行策略(PP,CP,SEP,TP-CONV)等。同时,对动态图自动并行系统地做了性能优化,在 Llama 等系列模型上性能基本持平手动并行的性能。
### 功能改进
-对于算子切分推导规则进行完善和优化
-
-- 新增`add_n`、`split`和`softmax_grad`算子切分推导规则。[#65606](https://github.com/PaddlePaddle/Paddle/pull/65606), [#69439](https://github.com/PaddlePaddle/Paddle/pull/69439)
-- 新增`assign`和`embedding_grad`算子切分推导规则。[#67457](https://github.com/PaddlePaddle/Paddle/pull/67457)
-- 新增`clip`算子切分推导规则。[#70632](https://github.com/PaddlePaddle/Paddle/pull/70632)
-- 新增`dist_stack`和`gather_nd`算子切分推导规则。[#65426](https://github.com/PaddlePaddle/Paddle/pull/65426)
-- 新增`dropout`算子切分推导规则。[#70216](https://github.com/PaddlePaddle/Paddle/pull/70216)
-- 新增`fused_dropout_add`算子切分推导规则。[#67722](https://github.com/PaddlePaddle/Paddle/pull/67722)
-- 新增`fast_ln`自定义算子切分推导规则。[#68148](https://github.com/PaddlePaddle/Paddle/pull/68148)
-- 新增`greater_equal`和`less_equal`算子切分推导规则。[#68868](https://github.com/PaddlePaddle/Paddle/pull/68868)
-- 新增`greater_than`和`less_than`算子切分推导规则。[#68133](https://github.com/PaddlePaddle/Paddle/pull/68133)
-- 新增`if`算子切分推导规则。[#69357](https://github.com/PaddlePaddle/Paddle/pull/69357)
-- 新增`logical_and`、`logical_not`、`logical_or`和`logical_xor`算子切分推导规则。[#67840](https://github.com/PaddlePaddle/Paddle/pull/67840)
-- 新增`logsumexp`算子切分推导规则。[#67840](https://github.com/PaddlePaddle/Paddle/pull/67840)
-- 新增`non_zero`算子切分推导规则。[#67996](https://github.com/PaddlePaddle/Paddle/pull/67996)
-- 新增`pad`算子切分推导规则。[#68304](https://github.com/PaddlePaddle/Paddle/pull/68304)
-- 新增`p_norm`算子切分推导规则。[#68317](https://github.com/PaddlePaddle/Paddle/pull/68317)
-- 新增`scatter_nd`算子切分推导规则。[#67980](https://github.com/PaddlePaddle/Paddle/pull/67980)
-- 新增`sigmoid`算子切分推导规则。[#71092](https://github.com/PaddlePaddle/Paddle/pull/71092)
-
-静态图自动并行架构基于 PIR 升级
-
-- 混合精度训练(AMP)升级。[#65089](https://github.com/PaddlePaddle/Paddle/pull/65089), [#65892](https://github.com/PaddlePaddle/Paddle/pull/65892), [#66418](https://github.com/PaddlePaddle/Paddle/pull/66418), [#66674](https://github.com/PaddlePaddle/Paddle/pull/66674), [#68545](https://github.com/PaddlePaddle/Paddle/pull/68545)
-- 重计算策略升级。[#69681](https://github.com/PaddlePaddle/Paddle/pull/69681), [#70064](https://github.com/PaddlePaddle/Paddle/pull/70064)
-- 参数切片并行策略升级。[#63542](https://github.com/PaddlePaddle/Paddle/pull/63542), [#67748](https://github.com/PaddlePaddle/Paddle/pull/67748), [#68288](https://github.com/PaddlePaddle/Paddle/pull/68288), [#68314](https://github.com/PaddlePaddle/Paddle/pull/68314), [#69059](https://github.com/PaddlePaddle/Paddle/pull/69059), [#71167](https://github.com/PaddlePaddle/Paddle/pull/71167)
-- 流水并行策略升级。[#66810](https://github.com/PaddlePaddle/Paddle/pull/66810), [#67174](https://github.com/PaddlePaddle/Paddle/pull/67174), [#67522](https://github.com/PaddlePaddle/Paddle/pull/67522), [#68141](https://github.com/PaddlePaddle/Paddle/pull/68141), [#68742](https://github.com/PaddlePaddle/Paddle/pull/68742), [#68962](https://github.com/PaddlePaddle/Paddle/pull/68962), [#69052](https://github.com/PaddlePaddle/Paddle/pull/69052), [#69201](https://github.com/PaddlePaddle/Paddle/pull/69201), [#69244](https://github.com/PaddlePaddle/Paddle/pull/69244), [#69578](https://github.com/PaddlePaddle/Paddle/pull/69578), [#69584](https://github.com/PaddlePaddle/Paddle/pull/69584), [#69654](https://github.com/PaddlePaddle/Paddle/pull/69654), [#69799](https://github.com/PaddlePaddle/Paddle/pull/69799), [#69894](https://github.com/PaddlePaddle/Paddle/pull/69894), [#70360](https://github.com/PaddlePaddle/Paddle/pull/70360), [#70615](https://github.com/PaddlePaddle/Paddle/pull/70615)
-- 梯度累加策略升级。[#66641](https://github.com/PaddlePaddle/Paddle/pull/66641), [#67254](https://github.com/PaddlePaddle/Paddle/pull/67254), [#67907](https://github.com/PaddlePaddle/Paddle/pull/67907), [#68391](https://github.com/PaddlePaddle/Paddle/pull/68391), [#68460](https://github.com/PaddlePaddle/Paddle/pull/68460), [#68472](https://github.com/PaddlePaddle/Paddle/pull/68472), [#68664](https://github.com/PaddlePaddle/Paddle/pull/68664), [#68727](https://github.com/PaddlePaddle/Paddle/pull/68727), [#69171](https://github.com/PaddlePaddle/Paddle/pull/69171), [#69805](https://github.com/PaddlePaddle/Paddle/pull/69805)
-- 算子融合策略升级。[#68087](https://github.com/PaddlePaddle/Paddle/pull/68087), [#68207](https://github.com/PaddlePaddle/Paddle/pull/68207), [#68383](https://github.com/PaddlePaddle/Paddle/pull/68383), [#68623](https://github.com/PaddlePaddle/Paddle/pull/68623), [#68650](https://github.com/PaddlePaddle/Paddle/pull/68650), [#68736](https://github.com/PaddlePaddle/Paddle/pull/68736), [#69103](https://github.com/PaddlePaddle/Paddle/pull/69103), [#70889](https://github.com/PaddlePaddle/Paddle/pull/70889)
-- `tensor_fusion`优化策略升级。[#66130](https://github.com/PaddlePaddle/Paddle/pull/66130), [#68475](https://github.com/PaddlePaddle/Paddle/pull/68475), [#69243](https://github.com/PaddlePaddle/Paddle/pull/69243), [#69560](https://github.com/PaddlePaddle/Paddle/pull/69560), [#69823](https://github.com/PaddlePaddle/Paddle/pull/69823), [#70195](https://github.com/PaddlePaddle/Paddle/pull/70195), [#70309](https://github.com/PaddlePaddle/Paddle/pull/70309), [#70363](https://github.com/PaddlePaddle/Paddle/pull/70363), [#70869](https://github.com/PaddlePaddle/Paddle/pull/70869)
-- 张量并行优化策略升级。[#68182](https://github.com/PaddlePaddle/Paddle/pull/68182), [#68389](https://github.com/PaddlePaddle/Paddle/pull/68389)
-- 自定义算子切分推导机制升级。[#67614](https://github.com/PaddlePaddle/Paddle/pull/67614)
-- 参数保存和加载机制升级。[#66416](https://github.com/PaddlePaddle/Paddle/pull/66416), [#67045](https://github.com/PaddlePaddle/Paddle/pull/67045), [#67369](https://github.com/PaddlePaddle/Paddle/pull/67369), [#68203](https://github.com/PaddlePaddle/Paddle/pull/68203)
-- 计算图编译时间优化。[#68796](https://github.com/PaddlePaddle/Paddle/pull/68796)
-
-### bug 修复
-
-- 修复切分推导机制及若干算子的切分推导规则 bug。[#65702](https://github.com/PaddlePaddle/Paddle/pull/65702), [#65835](https://github.com/PaddlePaddle/Paddle/pull/65835), [#66098](https://github.com/PaddlePaddle/Paddle/pull/66098), [#66955](https://github.com/PaddlePaddle/Paddle/pull/66955), [#67052](https://github.com/PaddlePaddle/Paddle/pull/67052), [#67059](https://github.com/PaddlePaddle/Paddle/pull/67059), [#67101](https://github.com/PaddlePaddle/Paddle/pull/67101), [#67283](https://github.com/PaddlePaddle/Paddle/pull/67283), [#67729](https://github.com/PaddlePaddle/Paddle/pull/67729), [#67996](https://github.com/PaddlePaddle/Paddle/pull/67996), [#68413](https://github.com/PaddlePaddle/Paddle/pull/68413), [#68455](https://github.com/PaddlePaddle/Paddle/pull/68455), [#68533](https://github.com/PaddlePaddle/Paddle/pull/68533), [#68976](https://github.com/PaddlePaddle/Paddle/pull/68976), [#68977](https://github.com/PaddlePaddle/Paddle/pull/68977), [#69027](https://github.com/PaddlePaddle/Paddle/pull/69027), [#69203](https://github.com/PaddlePaddle/Paddle/pull/69203), [#69223](https://github.com/PaddlePaddle/Paddle/pull/69223), [#69862](https://github.com/PaddlePaddle/Paddle/pull/69862), [#69991](https://github.com/PaddlePaddle/Paddle/pull/69991), [#70100](https://github.com/PaddlePaddle/Paddle/pull/70100), [#70624](https://github.com/PaddlePaddle/Paddle/pull/70624), [#71024](https://github.com/PaddlePaddle/Paddle/pull/71024), [#71152](https://github.com/PaddlePaddle/Paddle/pull/71152), [#71214](https://github.com/PaddlePaddle/Paddle/pull/71214), [#71253](https://github.com/PaddlePaddle/Paddle/pull/71253), [#71388](https://github.com/PaddlePaddle/Paddle/pull/71388)
-- 修复切分转换机制的若干 bug。[#65060](https://github.com/PaddlePaddle/Paddle/pull/65060), [#65820](https://github.com/PaddlePaddle/Paddle/pull/65820), [#67630](https://github.com/PaddlePaddle/Paddle/pull/67630), [#67809](https://github.com/PaddlePaddle/Paddle/pull/67809), [#68115](https://github.com/PaddlePaddle/Paddle/pull/68115), [#68468](https://github.com/PaddlePaddle/Paddle/pull/68468), [#70023](https://github.com/PaddlePaddle/Paddle/pull/70023)
-- 修复参数切片并行中`shard_degree`推导错误的 bug。[#68781](https://github.com/PaddlePaddle/Paddle/pull/68781), [#69214](https://github.com/PaddlePaddle/Paddle/pull/69214)
-- 修复`shard_dataloader`动态图和静态图结果不一致以及切分 dict 类型数据、自定义`sampler`场景等场景下的问题。[#65262](https://github.com/PaddlePaddle/Paddle/pull/65262), [#66096](https://github.com/PaddlePaddle/Paddle/pull/66096), [#66882](https://github.com/PaddlePaddle/Paddle/pull/66882), [#69620](https://github.com/PaddlePaddle/Paddle/pull/69620)
-- 修复`recompute`设置`use_reentrant=false`时和参数切分不兼容的 bug。[#65188](https://github.com/PaddlePaddle/Paddle/pull/65188)
-- 修复参数加载和保存功能的 bug。[#66266](https://github.com/PaddlePaddle/Paddle/pull/66266), [#69764](https://github.com/PaddlePaddle/Paddle/pull/69764)
-- 修复`Conv2D`、`fill_constant`、`flash_attn_grad`、`reduce_scatter`、`if`、`tuple_push`和`tuple_pop`等算子的 bug。[#67587](https://github.com/PaddlePaddle/Paddle/pull/67587), [#68008](https://github.com/PaddlePaddle/Paddle/pull/68008), [#68586](https://github.com/PaddlePaddle/Paddle/pull/68586), [#68589](https://github.com/PaddlePaddle/Paddle/pull/68589), [#69519](https://github.com/PaddlePaddle/Paddle/pull/69519), [#70207](https://github.com/PaddlePaddle/Paddle/pull/70207)
-- 修复`reduce_scatter`、`p_send`、`p_recv`等通信算子的 bug。[#67386](https://github.com/PaddlePaddle/Paddle/pull/67386), [#71433](https://github.com/PaddlePaddle/Paddle/pull/71433)
-- 修复张量类型提升的 bug。[#66541](https://github.com/PaddlePaddle/Paddle/pull/66541), [#68342](https://github.com/PaddlePaddle/Paddle/pull/68342)
-- 修复在部分卡上未初始化的分布式张量转 numpy 时自动分配显存的 bug。[#66361](https://github.com/PaddlePaddle/Paddle/pull/66361)
-- 修复非切分张量调用 to_tensor 时触发数据拷贝的 bug。[#67169](https://github.com/PaddlePaddle/Paddle/pull/67169)
-- 修复`scaler`参数切分的 bug。[#68289](https://github.com/PaddlePaddle/Paddle/pull/68289)
-- 修复`enable_delay_scale_loss`精度问题。[#68525](https://github.com/PaddlePaddle/Paddle/pull/68525)
-- 修复通信组创建顺序不同导致的 hang 问题。[#68847](https://github.com/PaddlePaddle/Paddle/pull/68847)
-- 修复静态图场景下`op_role`设置错误的 bug。[#67850](https://github.com/PaddlePaddle/Paddle/pull/67850), [#67986](https://github.com/PaddlePaddle/Paddle/pull/67986), [#68156](https://github.com/PaddlePaddle/Paddle/pull/68156)
-- 修复静态图下无法切分随机数算子输出变量的 bug。[#67589](https://github.com/PaddlePaddle/Paddle/pull/67589), [#67750](https://github.com/PaddlePaddle/Paddle/pull/67750), [#68067](https://github.com/PaddlePaddle/Paddle/pull/68067)
-- 修复静态图下计算图 cache 机制失效的 bug。[#68488](https://github.com/PaddlePaddle/Paddle/pull/68488)
-- 修复`paddle.distributed.to_distributed`索引越界的 bug。[#70174](https://github.com/PaddlePaddle/Paddle/pull/70174)
-- 修复流水并行可视化工具的 bug。[#71386](https://github.com/PaddlePaddle/Paddle/pull/71386)
-
-## 5. 算子机制
-
-算子相关 PR,包括组合算子拆分、新硬件适配算子 kernel、稀疏算子运算、旧 IR 算子退场等工作,为 PIR 适配编译器、多硬件并取得性能优势奠定了基础;规范了算子体系优化了代码结构,减少了技术债,并提升了可维护性。
+- 支持分布式张量的同一维度被多个 mesh 维度切分。 [#73233](https://github.com/PaddlePaddle/Paddle/pull/73233)
+- 支持自动并行通信拓扑描述 ProcessMesh 转换为手动并行通信组。 [#72052](https://github.com/PaddlePaddle/Paddle/pull/72052)
+- 支持任意可序列化 python object 的 send/recv。 [#72098](https://github.com/PaddlePaddle/Paddle/pull/72098)
+- 动态图并行策略补齐
+
+ - 支持流水线并行策略 1F1B 和 VPP 调度。 [#72155](https://github.com/PaddlePaddle/Paddle/pull/72155),[#72480](https://github.com/PaddlePaddle/Paddle/pull/72480),[#72179](https://github.com/PaddlePaddle/Paddle/pull/72179)
+ - 支持长文并行策略。[#73195](https://github.com/PaddlePaddle/Paddle/pull/73195)
+ - 支持视觉并行策略。[#73063](https://github.com/PaddlePaddle/Paddle/pull/73063),[#73039](https://github.com/PaddlePaddle/Paddle/pull/73039)
+ - 支持自动并行在数据并行维度的通信。[#72540](https://github.com/PaddlePaddle/Paddle/pull/72540)
+- 新增以下算子的切分推导规则
+
+ - `min`, `min_grad` [#72269](https://github.com/PaddlePaddle/Paddle/pull/72269)
+ - `bitwise_or`,`atan2`,`fmax`,`fmin`,`reciprocal` [#72310](https://github.com/PaddlePaddle/Paddle/pull/72310)
+ - `argmin`, `abs`, `cosh` [#72264](https://github.com/PaddlePaddle/Paddle/pull/72264)
+ - `mean_all`, `mean_all_grad` [#72479](https://github.com/PaddlePaddle/Paddle/pull/72479)
+ - `topk`, `topk_grad` [#72499](https://github.com/PaddlePaddle/Paddle/pull/72499)
+ - `argsort` [#72388](https://github.com/PaddlePaddle/Paddle/pull/72388)
+ - `round`, `mish`, `elu`, `selu`, `celu`, `stanh`, `softplus`, `softshrink`, `thresholded_relu`, `logit`, `nonzero` [#72312](https://github.com/PaddlePaddle/Paddle/pull/72312)
+ - `unique ops` [#72824](https://github.com/PaddlePaddle/Paddle/pull/72824)
+ - `put_along_axis` [#72766](https://github.com/PaddlePaddle/Paddle/pull/72766)
+ - `round_grad`, `trunc_grad`, `ceil_grad`, `floor_grad`, `poisson_grad` [#72677](https://github.com/PaddlePaddle/Paddle/pull/72677)
+ - `log_softmax`, `cummax`, `cummin` [#72720](https://github.com/PaddlePaddle/Paddle/pull/72720)
+ - `unary` [#72177](https://github.com/PaddlePaddle/Paddle/pull/72177)
+ - `unary_grad` [#72260](https://github.com/PaddlePaddle/Paddle/pull/72260)
+ - `index_select`, `index_select_grad` [#72727](https://github.com/PaddlePaddle/Paddle/pull/72727)
+ - `roll`, `roll_grad` [#72740](https://github.com/PaddlePaddle/Paddle/pull/72740)
+ - `empty_like` [#73169](https://github.com/PaddlePaddle/Paddle/pull/73169)
+ - `roi_align`, `roi_align_grad` [#72925](https://github.com/PaddlePaddle/Paddle/pull/72925)
+ - `expand_as`, `expand_as_grad` [#73107](https://github.com/PaddlePaddle/Paddle/pull/73107)
+ - `fused_gemm_epilogur` [#73126](https://github.com/PaddlePaddle/Paddle/pull/73126)
+ - `label_smooth`, `label_smooth` [#72845](https://github.com/PaddlePaddle/Paddle/pull/72845)
+ - `group_norm`, `group_norm_grad` [#72946](https://github.com/PaddlePaddle/Paddle/pull/72946)
+ - `instance_norm`, `instance_norm_grad` [#72938](https://github.com/PaddlePaddle/Paddle/pull/72938)
+ - `batch_norm`, `sync_batch_norm` [#72918](https://github.com/PaddlePaddle/Paddle/pull/72918)
+ - `reduce_any` [#73175](https://github.com/PaddlePaddle/Paddle/pull/73175)
+ - `fused_gemm_epilogue_rule` [#73494](https://github.com/PaddlePaddle/Paddle/pull/73494)
-### 新特性
+### 性能优化
-- 支持组合算子拆分。 [#65148](https://github.com/PaddlePaddle/Paddle/pull/65148), [#65007](https://github.com/PaddlePaddle/Paddle/pull/65007), [#65482](https://github.com/PaddlePaddle/Paddle/pull/65482), [#65006](https://github.com/PaddlePaddle/Paddle/pull/65006), [#65692](https://github.com/PaddlePaddle/Paddle/pull/65692), [#65961](https://github.com/PaddlePaddle/Paddle/pull/65961), [#65968](https://github.com/PaddlePaddle/Paddle/pull/65968), [#65967](https://github.com/PaddlePaddle/Paddle/pull/65967), [#66510](https://github.com/PaddlePaddle/Paddle/pull/66510), [#66795](https://github.com/PaddlePaddle/Paddle/pull/66795), [#66835](https://github.com/PaddlePaddle/Paddle/pull/66835), [#67151](https://github.com/PaddlePaddle/Paddle/pull/67151), [#67342](https://github.com/PaddlePaddle/Paddle/pull/67342), [#67481](https://github.com/PaddlePaddle/Paddle/pull/67481), [#67502](https://github.com/PaddlePaddle/Paddle/pull/67502), [#67606](https://github.com/PaddlePaddle/Paddle/pull/67606), [#67757](https://github.com/PaddlePaddle/Paddle/pull/67757), [#67775](https://github.com/PaddlePaddle/Paddle/pull/67775), [#67891](https://github.com/PaddlePaddle/Paddle/pull/67891), [#67790](https://github.com/PaddlePaddle/Paddle/pull/67790), [#67965](https://github.com/PaddlePaddle/Paddle/pull/67965), [#67968](https://github.com/PaddlePaddle/Paddle/pull/67968), [#68168](https://github.com/PaddlePaddle/Paddle/pull/68168), [#68125](https://github.com/PaddlePaddle/Paddle/pull/68125), [#68228](https://github.com/PaddlePaddle/Paddle/pull/68228), [#68295](https://github.com/PaddlePaddle/Paddle/pull/68295), [#68353](https://github.com/PaddlePaddle/Paddle/pull/68353), [#68357](https://github.com/PaddlePaddle/Paddle/pull/68357), [#68827](https://github.com/PaddlePaddle/Paddle/pull/68827), [#68834](https://github.com/PaddlePaddle/Paddle/pull/68834), [#69239](https://github.com/PaddlePaddle/Paddle/pull/69239), [#68817](https://github.com/PaddlePaddle/Paddle/pull/68817), [#69108](https://github.com/PaddlePaddle/Paddle/pull/69108), [#69373](https://github.com/PaddlePaddle/Paddle/pull/69373), [#69372](https://github.com/PaddlePaddle/Paddle/pull/69372), [#68829](https://github.com/PaddlePaddle/Paddle/pull/68829), [#69684](https://github.com/PaddlePaddle/Paddle/pull/69684), [#68818](https://github.com/PaddlePaddle/Paddle/pull/68818), [#68835](https://github.com/PaddlePaddle/Paddle/pull/68835), [#69838](https://github.com/PaddlePaddle/Paddle/pull/69838), [#69998](https://github.com/PaddlePaddle/Paddle/pull/69998), [#69675](https://github.com/PaddlePaddle/Paddle/pull/69675), [#70367](https://github.com/PaddlePaddle/Paddle/pull/70367), [#70080](https://github.com/PaddlePaddle/Paddle/pull/70080), [#71352](https://github.com/PaddlePaddle/Paddle/pull/71352), [#66450](https://github.com/PaddlePaddle/Paddle/pull/66450), [#67593](https://github.com/PaddlePaddle/Paddle/pull/67593), [#67988](https://github.com/PaddlePaddle/Paddle/pull/67988), [#68346](https://github.com/PaddlePaddle/Paddle/pull/68346), [#68399](https://github.com/PaddlePaddle/Paddle/pull/68399), [#68319](https://github.com/PaddlePaddle/Paddle/pull/68319), [#68485](https://github.com/PaddlePaddle/Paddle/pull/68485), [#68961](https://github.com/PaddlePaddle/Paddle/pull/68961), [#68575](https://github.com/PaddlePaddle/Paddle/pull/68575)
-- PIR 支持 Pylayer。 [#69674](https://github.com/PaddlePaddle/Paddle/pull/69674), [#70375](https://github.com/PaddlePaddle/Paddle/pull/70375)
-- 支持 XPU 相关算子计算。 [#65684](https://github.com/PaddlePaddle/Paddle/pull/65684), [#65976](https://github.com/PaddlePaddle/Paddle/pull/65976), [#68497](https://github.com/PaddlePaddle/Paddle/pull/68497)
-- PIR 支持稀疏算子。 [#62663](https://github.com/PaddlePaddle/Paddle/pull/62663), [#67885](https://github.com/PaddlePaddle/Paddle/pull/67885), [#67976](https://github.com/PaddlePaddle/Paddle/pull/67976), [#68261](https://github.com/PaddlePaddle/Paddle/pull/68261), [#68326](https://github.com/PaddlePaddle/Paddle/pull/68326)
-- 支持手动 Recompute。 [#65879](https://github.com/PaddlePaddle/Paddle/pull/65879)
-- 实现 kernel 并注册算子。 [#63130](https://github.com/PaddlePaddle/Paddle/pull/63130)
-- 支持 Custom Op。 [#68824](https://github.com/PaddlePaddle/Paddle/pull/68824), [#68748](https://github.com/PaddlePaddle/Paddle/pull/68748)
-- 添加 acos 的动态图二阶反向组合。 [#70409](https://github.com/PaddlePaddle/Paddle/pull/70409)
-- 支持 0-size tensor 的初始化和计算。 [#70504](https://github.com/PaddlePaddle/Paddle/pull/70504)
+* 支持分组切分并行的 tensor_fusion 优化策略和 overlap 优化策略。 [#72551](https://github.com/PaddlePaddle/Paddle/pull/72551), [#72902](https://github.com/PaddlePaddle/Paddle/pull/72902), [#73142](https://github.com/PaddlePaddle/Paddle/pull/73142),[#71785](https://github.com/PaddlePaddle/Paddle/pull/71785)
+* 优化 reshard 模块,以降低通信开销。[#71969](https://github.com/PaddlePaddle/Paddle/pull/71969), [#73024](https://github.com/PaddlePaddle/Paddle/pull/73024),[#71868](https://github.com/PaddlePaddle/Paddle/pull/71868)
+* 优化 multiply 的切分推导规则,以降低通信开销。[#73408](https://github.com/PaddlePaddle/Paddle/pull/73408)
+* 优化分布式切分状态为 Partial 时反向通信,以降低通信开销。 [#73236](https://github.com/PaddlePaddle/Paddle/pull/73236)
+* 梯度更新时通信融合优化。 [#72120](https://github.com/PaddlePaddle/Paddle/pull/72120 )、[#72745](https://github.com/PaddlePaddle/Paddle/pull/72745)
+* 优化 gelu 切分推导,以降低通信开销。 [#73279](https://github.com/PaddlePaddle/Paddle/pull/73279)
+* 优化 fused_rms_norm 在输入有 Partial 状态时的切分推导规则,以减少通信和计算开销。 [#73054](https://github.com/PaddlePaddle/Paddle/pull/73054)
### Bug 修复
-- 修复组合算子相关 Bug。 [#70250](https://github.com/PaddlePaddle/Paddle/pull/70250), [#67170](https://github.com/PaddlePaddle/Paddle/pull/67170), [#71218](https://github.com/PaddlePaddle/Paddle/pull/71218), [#69095](https://github.com/PaddlePaddle/Paddle/pull/69095), [#70189](https://github.com/PaddlePaddle/Paddle/pull/70189)
-- 修复 XPU 相关 Bug。 [#65149](https://github.com/PaddlePaddle/Paddle/pull/65149), [#70845](https://github.com/PaddlePaddle/Paddle/pull/70845)
-- 修复 shape 相关 Bug。 [#68722](https://github.com/PaddlePaddle/Paddle/pull/68722), [#70210](https://github.com/PaddlePaddle/Paddle/pull/70210), [#70492](https://github.com/PaddlePaddle/Paddle/pull/70492)
-- 修复 save/load 相关 Bug。 [#69153](https://github.com/PaddlePaddle/Paddle/pull/69153)
-- 修复类型相关 Bug。 [#65721](https://github.com/PaddlePaddle/Paddle/pull/65721), [#65859](https://github.com/PaddlePaddle/Paddle/pull/65859)
-- 其他算子调用和执行过程中的问题修复,包括类型匹配、类型推导、参数类型支持等,。 [#65360](https://github.com/PaddlePaddle/Paddle/pull/65360), [#65024](https://github.com/PaddlePaddle/Paddle/pull/65024), [#66308](https://github.com/PaddlePaddle/Paddle/pull/66308), [#67085](https://github.com/PaddlePaddle/Paddle/pull/67085), [#67285](https://github.com/PaddlePaddle/Paddle/pull/67285), [#67076](https://github.com/PaddlePaddle/Paddle/pull/67076), [#67547](https://github.com/PaddlePaddle/Paddle/pull/67547), [#68007](https://github.com/PaddlePaddle/Paddle/pull/68007), [#68527](https://github.com/PaddlePaddle/Paddle/pull/68527), [#68549](https://github.com/PaddlePaddle/Paddle/pull/68549), [#68543](https://github.com/PaddlePaddle/Paddle/pull/68543), [#68604](https://github.com/PaddlePaddle/Paddle/pull/68604), [#68741](https://github.com/PaddlePaddle/Paddle/pull/68741), [#68859](https://github.com/PaddlePaddle/Paddle/pull/68859), [#69025](https://github.com/PaddlePaddle/Paddle/pull/69025), [#69065](https://github.com/PaddlePaddle/Paddle/pull/69065), [#69405](https://github.com/PaddlePaddle/Paddle/pull/69405), [#69688](https://github.com/PaddlePaddle/Paddle/pull/69688), [#69912](https://github.com/PaddlePaddle/Paddle/pull/69912), [#70177](https://github.com/PaddlePaddle/Paddle/pull/70177), [#70517](https://github.com/PaddlePaddle/Paddle/pull/70517), [#70596](https://github.com/PaddlePaddle/Paddle/pull/70596), [#70788](https://github.com/PaddlePaddle/Paddle/pull/70788), [#70870](https://github.com/PaddlePaddle/Paddle/pull/70870), [#71332](https://github.com/PaddlePaddle/Paddle/pull/71332), [#71454](https://github.com/PaddlePaddle/Paddle/pull/71454), [#71442](https://github.com/PaddlePaddle/Paddle/pull/71442), [#71499](https://github.com/PaddlePaddle/Paddle/pull/71499), [#67459](https://github.com/PaddlePaddle/Paddle/pull/67459), [#68470](https://github.com/PaddlePaddle/Paddle/pull/68470), [#70206](https://github.com/PaddlePaddle/Paddle/pull/70206)
+- 修复虚拟流水线并行策略在 H 卡上通信 hang 的 bug。[#71104](https://github.com/PaddlePaddle/Paddle/pull/71104), [#73470](https://github.com/PaddlePaddle/Paddle/pull/73470)
+- 修复 save/load 的 bug。 [#72023](https://github.com/PaddlePaddle/Paddle/pull/72023)
+- 修复 linear_fused_grad_add 策略在动态图模式下跑不通的 bug。 [#72708](https://github.com/PaddlePaddle/Paddle/pull/72708))
+- 修复 fused_rms_norm 算子跑不通和精度 bug。 [#72663](https://github.com/PaddlePaddle/Paddle/pull/72663 )
+- 修复 expand 算子切分推导规则的 bug。[#73154](https://github.com/PaddlePaddle/Paddle/pull/73154)
### 其他
-- 优化代码风格。 [#68536](https://github.com/PaddlePaddle/Paddle/pull/68536)
-- 修复拼写错误。 [#67456](https://github.com/PaddlePaddle/Paddle/pull/67456), [#66673](https://github.com/PaddlePaddle/Paddle/pull/66673), [#68702](https://github.com/PaddlePaddle/Paddle/pull/68702), [#68735](https://github.com/PaddlePaddle/Paddle/pull/68735), [#68718](https://github.com/PaddlePaddle/Paddle/pull/68718), [#70700](https://github.com/PaddlePaddle/Paddle/pull/70700), [#70682](https://github.com/PaddlePaddle/Paddle/pull/70682), [#70670](https://github.com/PaddlePaddle/Paddle/pull/70670), [#70241](https://github.com/PaddlePaddle/Paddle/pull/70241), [#69626](https://github.com/PaddlePaddle/Paddle/pull/69626), [#70051](https://github.com/PaddlePaddle/Paddle/pull/70051), [#67764](https://github.com/PaddlePaddle/Paddle/pull/67764), [#68872](https://github.com/PaddlePaddle/Paddle/pull/68872), [#70055](https://github.com/PaddlePaddle/Paddle/pull/70055), [#67954](https://github.com/PaddlePaddle/Paddle/pull/67954), [#67404](https://github.com/PaddlePaddle/Paddle/pull/67404), [#69273](https://github.com/PaddlePaddle/Paddle/pull/69273), [#66981](https://github.com/PaddlePaddle/Paddle/pull/66981), [#68145](https://github.com/PaddlePaddle/Paddle/pull/68145), [#69148](https://github.com/PaddlePaddle/Paddle/pull/69148), [#69145](https://github.com/PaddlePaddle/Paddle/pull/69145), [#69168](https://github.com/PaddlePaddle/Paddle/pull/69168), [#68940](https://github.com/PaddlePaddle/Paddle/pull/68940), [#70344](https://github.com/PaddlePaddle/Paddle/pull/70344)
-- 修改接口文档。 [#69378](https://github.com/PaddlePaddle/Paddle/pull/69378)
-- 替换 fluid 算子体系下的算子及参数命名。 [#69345](https://github.com/PaddlePaddle/Paddle/pull/69345), [#69382](https://github.com/PaddlePaddle/Paddle/pull/69382), [#69484](https://github.com/PaddlePaddle/Paddle/pull/69484), [#69444](https://github.com/PaddlePaddle/Paddle/pull/69444)
-
-### 废弃
-
-- xshape 输出退场。 [#66769](https://github.com/PaddlePaddle/Paddle/pull/66769), [#67009](https://github.com/PaddlePaddle/Paddle/pull/67009), [#67152](https://github.com/PaddlePaddle/Paddle/pull/67152), [#67172](https://github.com/PaddlePaddle/Paddle/pull/67172), [#67355](https://github.com/PaddlePaddle/Paddle/pull/67355), [#67373](https://github.com/PaddlePaddle/Paddle/pull/67373), [#66089](https://github.com/PaddlePaddle/Paddle/pull/66089)
-- 移除 fluid 体系下废弃的算子及其 kernel、相关单测、相关调用代码。 [#67370](https://github.com/PaddlePaddle/Paddle/pull/67370), [#67088](https://github.com/PaddlePaddle/Paddle/pull/67088), [#67324](https://github.com/PaddlePaddle/Paddle/pull/67324), [#67666](https://github.com/PaddlePaddle/Paddle/pull/67666), [#68058](https://github.com/PaddlePaddle/Paddle/pull/68058), [#68311](https://github.com/PaddlePaddle/Paddle/pull/68311), [#68358](https://github.com/PaddlePaddle/Paddle/pull/68358), [#68312](https://github.com/PaddlePaddle/Paddle/pull/68312), [#68355](https://github.com/PaddlePaddle/Paddle/pull/68355), [#67528](https://github.com/PaddlePaddle/Paddle/pull/67528), [#68316](https://github.com/PaddlePaddle/Paddle/pull/68316), [#68356](https://github.com/PaddlePaddle/Paddle/pull/68356), [#68397](https://github.com/PaddlePaddle/Paddle/pull/68397), [#68441](https://github.com/PaddlePaddle/Paddle/pull/68441), [#68417](https://github.com/PaddlePaddle/Paddle/pull/68417), [#68567](https://github.com/PaddlePaddle/Paddle/pull/68567), [#68583](https://github.com/PaddlePaddle/Paddle/pull/68583), [#68649](https://github.com/PaddlePaddle/Paddle/pull/68649), [#68331](https://github.com/PaddlePaddle/Paddle/pull/68331), [#68730](https://github.com/PaddlePaddle/Paddle/pull/68730), [#69754](https://github.com/PaddlePaddle/Paddle/pull/69754), [#69445](https://github.com/PaddlePaddle/Paddle/pull/69445), [#69921](https://github.com/PaddlePaddle/Paddle/pull/69921), [#70268](https://github.com/PaddlePaddle/Paddle/pull/70268), [#69446](https://github.com/PaddlePaddle/Paddle/pull/69446), [#69544](https://github.com/PaddlePaddle/Paddle/pull/69544), [#70272](https://github.com/PaddlePaddle/Paddle/pull/70272), [#69745](https://github.com/PaddlePaddle/Paddle/pull/69745), [#70300](https://github.com/PaddlePaddle/Paddle/pull/70300), [#70388](https://github.com/PaddlePaddle/Paddle/pull/70388), [#70421](https://github.com/PaddlePaddle/Paddle/pull/70421), [#70302](https://github.com/PaddlePaddle/Paddle/pull/70302), [#70445](https://github.com/PaddlePaddle/Paddle/pull/70445), [#69275](https://github.com/PaddlePaddle/Paddle/pull/69275), [#69081](https://github.com/PaddlePaddle/Paddle/pull/69081), [#70588](https://github.com/PaddlePaddle/Paddle/pull/70588), [#67778](https://github.com/PaddlePaddle/Paddle/pull/67778), [#67953](https://github.com/PaddlePaddle/Paddle/pull/67953), [#68093](https://github.com/PaddlePaddle/Paddle/pull/68093), [#68092](https://github.com/PaddlePaddle/Paddle/pull/68092), [#67684](https://github.com/PaddlePaddle/Paddle/pull/67684), [#69665](https://github.com/PaddlePaddle/Paddle/pull/69665), [#67915](https://github.com/PaddlePaddle/Paddle/pull/67915), [#67917](https://github.com/PaddlePaddle/Paddle/pull/67917), [#68403](https://github.com/PaddlePaddle/Paddle/pull/68403), [#68404](https://github.com/PaddlePaddle/Paddle/pull/68404), [#68969](https://github.com/PaddlePaddle/Paddle/pull/68969), [#68953](https://github.com/PaddlePaddle/Paddle/pull/68953), [#68954](https://github.com/PaddlePaddle/Paddle/pull/68954), [#68942](https://github.com/PaddlePaddle/Paddle/pull/68942), [#68950](https://github.com/PaddlePaddle/Paddle/pull/68950), [#69381](https://github.com/PaddlePaddle/Paddle/pull/69381), [#69380](https://github.com/PaddlePaddle/Paddle/pull/69380), [#69448](https://github.com/PaddlePaddle/Paddle/pull/69448), [#69680](https://github.com/PaddlePaddle/Paddle/pull/69680), [#69775](https://github.com/PaddlePaddle/Paddle/pull/69775), [#69812](https://github.com/PaddlePaddle/Paddle/pull/69812), [#69840](https://github.com/PaddlePaddle/Paddle/pull/69840), [#69828](https://github.com/PaddlePaddle/Paddle/pull/69828), [#69742](https://github.com/PaddlePaddle/Paddle/pull/69742), [#69923](https://github.com/PaddlePaddle/Paddle/pull/69923), [#69922](https://github.com/PaddlePaddle/Paddle/pull/69922), [#69904](https://github.com/PaddlePaddle/Paddle/pull/69904), [#70002](https://github.com/PaddlePaddle/Paddle/pull/70002), [#70054](https://github.com/PaddlePaddle/Paddle/pull/70054), [#70052](https://github.com/PaddlePaddle/Paddle/pull/70052), [#70053](https://github.com/PaddlePaddle/Paddle/pull/70053), [#70713](https://github.com/PaddlePaddle/Paddle/pull/70713), [#70718](https://github.com/PaddlePaddle/Paddle/pull/70718), [#70718](https://github.com/PaddlePaddle/Paddle/pull/70718), [#70717](https://github.com/PaddlePaddle/Paddle/pull/70717)
-- 移除废弃 Flag。 [#70727](https://github.com/PaddlePaddle/Paddle/pull/70727), [#70726](https://github.com/PaddlePaddle/Paddle/pull/70726)
-- 移除组合算子废弃 API。 [#69873](https://github.com/PaddlePaddle/Paddle/pull/69873), [#69309](https://github.com/PaddlePaddle/Paddle/pull/69309)
-
-### 开发者相关
-
-- 支持组合算子,包括适配算子、添加 Flag、测试用例等。 [#67725](https://github.com/PaddlePaddle/Paddle/pull/67725), [#65252](https://github.com/PaddlePaddle/Paddle/pull/65252), [#67590](https://github.com/PaddlePaddle/Paddle/pull/67590), [#68076](https://github.com/PaddlePaddle/Paddle/pull/68076), [#66711](https://github.com/PaddlePaddle/Paddle/pull/66711), [#68813](https://github.com/PaddlePaddle/Paddle/pull/68813), [#68928](https://github.com/PaddlePaddle/Paddle/pull/68928), [#69054](https://github.com/PaddlePaddle/Paddle/pull/69054), [#69156](https://github.com/PaddlePaddle/Paddle/pull/69156), [#69255](https://github.com/PaddlePaddle/Paddle/pull/69255), [#69460](https://github.com/PaddlePaddle/Paddle/pull/69460), [#70270](https://github.com/PaddlePaddle/Paddle/pull/70270)
-- 为算子添加单测。 [#68272](https://github.com/PaddlePaddle/Paddle/pull/68272), [#68490](https://github.com/PaddlePaddle/Paddle/pull/68490)
-- 增加算子 API 别名用于 PaddleCustomDevice。 [#69526](https://github.com/PaddlePaddle/Paddle/pull/69526)
-- 移动算子定义位置,使其只支持动态图。 [#69289](https://github.com/PaddlePaddle/Paddle/pull/69289)
-- 标注仅前向计算算算子。 [#68580](https://github.com/PaddlePaddle/Paddle/pull/68580)
-- 将 view 运算的反向算子改为复用前向算子,从而支持科学计算场景下高阶微分的需求。 [#71086](https://github.com/PaddlePaddle/Paddle/pull/71086)
-- 迁移算子文件位置/修改函数命名空间/修改函数参数名等。 [#66393](https://github.com/PaddlePaddle/Paddle/pull/66393), [#67066](https://github.com/PaddlePaddle/Paddle/pull/67066), [#67012](https://github.com/PaddlePaddle/Paddle/pull/67012), [#67243](https://github.com/PaddlePaddle/Paddle/pull/67243), [#67367](https://github.com/PaddlePaddle/Paddle/pull/67367), [#67760](https://github.com/PaddlePaddle/Paddle/pull/67760), [#67242](https://github.com/PaddlePaddle/Paddle/pull/67242), [#67189](https://github.com/PaddlePaddle/Paddle/pull/67189), [#67899](https://github.com/PaddlePaddle/Paddle/pull/67899), [#67687](https://github.com/PaddlePaddle/Paddle/pull/67687), [#68035](https://github.com/PaddlePaddle/Paddle/pull/68035), [#67682](https://github.com/PaddlePaddle/Paddle/pull/67682), [#68464](https://github.com/PaddlePaddle/Paddle/pull/68464), [#68469](https://github.com/PaddlePaddle/Paddle/pull/68469), [#67900](https://github.com/PaddlePaddle/Paddle/pull/67900), [#68563](https://github.com/PaddlePaddle/Paddle/pull/68563), [#68562](https://github.com/PaddlePaddle/Paddle/pull/68562), [#68564](https://github.com/PaddlePaddle/Paddle/pull/68564), [#68479](https://github.com/PaddlePaddle/Paddle/pull/68479), [#68588](https://github.com/PaddlePaddle/Paddle/pull/68588), [#68726](https://github.com/PaddlePaddle/Paddle/pull/68726), [#68719](https://github.com/PaddlePaddle/Paddle/pull/68719), [#68767](https://github.com/PaddlePaddle/Paddle/pull/68767), [#68557](https://github.com/PaddlePaddle/Paddle/pull/68557), [#68671](https://github.com/PaddlePaddle/Paddle/pull/68671), [#68786](https://github.com/PaddlePaddle/Paddle/pull/68786), [#67948](https://github.com/PaddlePaddle/Paddle/pull/67948), [#64999](https://github.com/PaddlePaddle/Paddle/pull/64999), [#68581](https://github.com/PaddlePaddle/Paddle/pull/68581), [#68361](https://github.com/PaddlePaddle/Paddle/pull/68361), [#68656](https://github.com/PaddlePaddle/Paddle/pull/68656), [#68396](https://github.com/PaddlePaddle/Paddle/pull/68396), [#68059](https://github.com/PaddlePaddle/Paddle/pull/68059), [#68785](https://github.com/PaddlePaddle/Paddle/pull/68785), [#68665](https://github.com/PaddlePaddle/Paddle/pull/68665), [#68869](https://github.com/PaddlePaddle/Paddle/pull/68869), [#67626](https://github.com/PaddlePaddle/Paddle/pull/67626), [#68921](https://github.com/PaddlePaddle/Paddle/pull/68921), [#69268](https://github.com/PaddlePaddle/Paddle/pull/69268), [#69271](https://github.com/PaddlePaddle/Paddle/pull/69271), [#69306](https://github.com/PaddlePaddle/Paddle/pull/69306), [#69302](https://github.com/PaddlePaddle/Paddle/pull/69302), [#69341](https://github.com/PaddlePaddle/Paddle/pull/69341), [#69364](https://github.com/PaddlePaddle/Paddle/pull/69364), [#69343](https://github.com/PaddlePaddle/Paddle/pull/69343), [#69383](https://github.com/PaddlePaddle/Paddle/pull/69383), [#69415](https://github.com/PaddlePaddle/Paddle/pull/69415), [#69437](https://github.com/PaddlePaddle/Paddle/pull/69437), [#69494](https://github.com/PaddlePaddle/Paddle/pull/69494), [#69541](https://github.com/PaddlePaddle/Paddle/pull/69541), [#69543](https://github.com/PaddlePaddle/Paddle/pull/69543), [#69540](https://github.com/PaddlePaddle/Paddle/pull/69540), [#69569](https://github.com/PaddlePaddle/Paddle/pull/69569), [#69568](https://github.com/PaddlePaddle/Paddle/pull/69568), [#69621](https://github.com/PaddlePaddle/Paddle/pull/69621), [#69622](https://github.com/PaddlePaddle/Paddle/pull/69622), [#69701](https://github.com/PaddlePaddle/Paddle/pull/69701), [#69702](https://github.com/PaddlePaddle/Paddle/pull/69702), [#69704](https://github.com/PaddlePaddle/Paddle/pull/69704), [#69743](https://github.com/PaddlePaddle/Paddle/pull/69743), [#69780](https://github.com/PaddlePaddle/Paddle/pull/69780), [#69814](https://github.com/PaddlePaddle/Paddle/pull/69814), [#69822](https://github.com/PaddlePaddle/Paddle/pull/69822), [#69893](https://github.com/PaddlePaddle/Paddle/pull/69893), [#69967](https://github.com/PaddlePaddle/Paddle/pull/69967), [#69976](https://github.com/PaddlePaddle/Paddle/pull/69976), [#70011](https://github.com/PaddlePaddle/Paddle/pull/70011), [#70015](https://github.com/PaddlePaddle/Paddle/pull/70015), [#70007](https://github.com/PaddlePaddle/Paddle/pull/70007), [#70010](https://github.com/PaddlePaddle/Paddle/pull/70010), [#70346](https://github.com/PaddlePaddle/Paddle/pull/70346), [#70414](https://github.com/PaddlePaddle/Paddle/pull/70414), [#69951](https://github.com/PaddlePaddle/Paddle/pull/69951), [#70299](https://github.com/PaddlePaddle/Paddle/pull/70299), [#70441](https://github.com/PaddlePaddle/Paddle/pull/70441), [#70435](https://github.com/PaddlePaddle/Paddle/pull/70435), [#68420](https://github.com/PaddlePaddle/Paddle/pull/68420), [#70671](https://github.com/PaddlePaddle/Paddle/pull/70671), [#70705](https://github.com/PaddlePaddle/Paddle/pull/70705), [#68540](https://github.com/PaddlePaddle/Paddle/pull/68540), [#70211](https://github.com/PaddlePaddle/Paddle/pull/70211), [#67489](https://github.com/PaddlePaddle/Paddle/pull/67489), [#66927](https://github.com/PaddlePaddle/Paddle/pull/66927), [#66942](https://github.com/PaddlePaddle/Paddle/pull/66942), [#66848](https://github.com/PaddlePaddle/Paddle/pull/66848), [#66796](https://github.com/PaddlePaddle/Paddle/pull/66796), [#67036](https://github.com/PaddlePaddle/Paddle/pull/67036), [#67244](https://github.com/PaddlePaddle/Paddle/pull/67244), [#67299](https://github.com/PaddlePaddle/Paddle/pull/67299), [#67171](https://github.com/PaddlePaddle/Paddle/pull/67171), [#67293](https://github.com/PaddlePaddle/Paddle/pull/67293), [#67208](https://github.com/PaddlePaddle/Paddle/pull/67208), [#67408](https://github.com/PaddlePaddle/Paddle/pull/67408), [#67523](https://github.com/PaddlePaddle/Paddle/pull/67523), [#67689](https://github.com/PaddlePaddle/Paddle/pull/67689), [#67694](https://github.com/PaddlePaddle/Paddle/pull/67694), [#67797](https://github.com/PaddlePaddle/Paddle/pull/67797), [#67894](https://github.com/PaddlePaddle/Paddle/pull/67894), [#65969](https://github.com/PaddlePaddle/Paddle/pull/65969), [#65939](https://github.com/PaddlePaddle/Paddle/pull/65939), [#67928](https://github.com/PaddlePaddle/Paddle/pull/67928), [#68097](https://github.com/PaddlePaddle/Paddle/pull/68097), [#66744](https://github.com/PaddlePaddle/Paddle/pull/66744), [#68496](https://github.com/PaddlePaddle/Paddle/pull/68496), [#66943](https://github.com/PaddlePaddle/Paddle/pull/66943), [#68773](https://github.com/PaddlePaddle/Paddle/pull/68773), [#69272](https://github.com/PaddlePaddle/Paddle/pull/69272)
-- 移动测试文件位置。 [#67564](https://github.com/PaddlePaddle/Paddle/pull/67564), [#68266](https://github.com/PaddlePaddle/Paddle/pull/68266), [#68634](https://github.com/PaddlePaddle/Paddle/pull/68634)
-- xshape 输出退场相关前置修改。 [#67543](https://github.com/PaddlePaddle/Paddle/pull/67543), [#67572](https://github.com/PaddlePaddle/Paddle/pull/67572)
-
-### 改进
-
-- 支持了更多数据类型。 [#69143](https://github.com/PaddlePaddle/Paddle/pull/69143)
-- 更新 xpu 接口。 [#69800](https://github.com/PaddlePaddle/Paddle/pull/69800)
-- 改进了算子打印功能。 [#69916](https://github.com/PaddlePaddle/Paddle/pull/69916)
-- 升级了 normalize 操作以支持更多场景。 [#70152](https://github.com/PaddlePaddle/Paddle/pull/70152)
-- 扩展了 group_norm 以处理 rank 大于 5 的情况。 [#68774](https://github.com/PaddlePaddle/Paddle/pull/68774)
-- 改进了 backward_blacklist 的使用。 [#69356](https://github.com/PaddlePaddle/Paddle/pull/69356)
-
-### 性能提升
-
-- 优化了 where_double_grad 算子的性能。 [#70404](https://github.com/PaddlePaddle/Paddle/pull/70404)
-- 将 for range 改为 slice 加快 grad 执行速度。 [#69938](https://github.com/PaddlePaddle/Paddle/pull/69938)
-
-## 6. 框架性能优化
+- 清理废弃代码,以便于维护代码。 [#71814](https://github.com/PaddlePaddle/Paddle/pull/71814),[#72538](https://github.com/PaddlePaddle/Paddle/pull/72538)
+- 新增 API local_map,将分布式张量传递给为普通张量编写的函数。 ([#71804](https://github.com/PaddlePaddle/Paddle/pull/71804))
+- 为算子 fused_linear_param_grad_add 增加检查。([#72483](https://github.com/PaddlePaddle/Paddle/pull/72483))
-性能优化相关 PR,包括优化算子性能、优化 kernel 表现、优化内存、优化命名空间等,给使用者带来更好的开发体验。
+## 5. 算子机制
### 新特性
-- 增强对 fp8 类型的支持。 [#64735](https://github.com/PaddlePaddle/Paddle/pull/64735), [#64955](https://github.com/PaddlePaddle/Paddle/pull/64955)
-- 增强对 xpu 的支持。 [#65362](https://github.com/PaddlePaddle/Paddle/pull/65362), [#65304](https://github.com/PaddlePaddle/Paddle/pull/65304), [#68451](https://github.com/PaddlePaddle/Paddle/pull/68451)
-- 增强对 DCU 的支持。 [#65398](https://github.com/PaddlePaddle/Paddle/pull/65398), [#65857](https://github.com/PaddlePaddle/Paddle/pull/65857), [#66423](https://github.com/PaddlePaddle/Paddle/pull/66423)
-- 扩展 oneDNN 能力。 [#66000](https://github.com/PaddlePaddle/Paddle/pull/66000), [#66474](https://github.com/PaddlePaddle/Paddle/pull/66474), [#66568](https://github.com/PaddlePaddle/Paddle/pull/66568)
-- 重命名参数并支持更复杂的 mask。 [#65409](https://github.com/PaddlePaddle/Paddle/pull/65409)
-- 支持 flash-attention。 [#68968](https://github.com/PaddlePaddle/Paddle/pull/68968)
-- 支持 OpenVINO CPU 高性能推理。 [#69122](https://github.com/PaddlePaddle/Paddle/pull/69122)
-
-### 功能改进
-
-- 增强 PIR pass 以实现更好融合。 [#65540](https://github.com/PaddlePaddle/Paddle/pull/65540)
-- 增强 OneDNN 功能。 [#65971](https://github.com/PaddlePaddle/Paddle/pull/65971), [#70430](https://github.com/PaddlePaddle/Paddle/pull/70430), [#70630](https://github.com/PaddlePaddle/Paddle/pull/70630), [#70871](https://github.com/PaddlePaddle/Paddle/pull/70871)
-- 提升 FlashMask 性能。 [#68109](https://github.com/PaddlePaddle/Paddle/pull/68109)
-- 优化 kernel 表现。 [#69660](https://github.com/PaddlePaddle/Paddle/pull/69660), [#69596](https://github.com/PaddlePaddle/Paddle/pull/69596)
-- 组合算子优化。 [#69515](https://github.com/PaddlePaddle/Paddle/pull/69515), [#69616](https://github.com/PaddlePaddle/Paddle/pull/69616)
+- 梯度与自动微分优化:初步支持 put_along_axis 及 repeat_interleave 操作的双重梯度计算,提升复杂算子在自动微分场景下的数值稳定性,实现 masked_fill 操作的算子分解。 [#72789](https://github.com/PaddlePaddle/Paddle/pull/72789), [#73056](https://github.com/PaddlePaddle/Paddle/pull/73056), [#73225](https://github.com/PaddlePaddle/Paddle/pull/73225)
+- 运算符机制扩展:新增对__radd__和__rmul__的自定义支持,增强框架对非对称运算符的重载能力。 [#73119](https://github.com/PaddlePaddle/Paddle/pull/73119)
+- FP8 模块支持及算子开发:新增 FP8 块量化 GEMM 支持,引入多个融合算子,为混合专家(MoE)模型提供高效算子级实现,提升训推性能。 [#73228](https://github.com/PaddlePaddle/Paddle/pull/73228), [#73285](https://github.com/PaddlePaddle/Paddle/pull/73285), [#73133](https://github.com/PaddlePaddle/Paddle/pull/73133), [#73364](https://github.com/PaddlePaddle/Paddle/pull/73364), [#73520](https://github.com/PaddlePaddle/Paddle/pull/73520), [#73531](https://github.com/PaddlePaddle/Paddle/pull/73531)
### Bug 修复
-- 修复 PIR、CINN、SOT、OneDNN 等相关的 Bug。 [#68951](https://github.com/PaddlePaddle/Paddle/pull/68951), [#69553](https://github.com/PaddlePaddle/Paddle/pull/69553), [#69682](https://github.com/PaddlePaddle/Paddle/pull/69682), [#67741](https://github.com/PaddlePaddle/Paddle/pull/67741), [#69346](https://github.com/PaddlePaddle/Paddle/pull/69346), [#69401](https://github.com/PaddlePaddle/Paddle/pull/69401), [#68903](https://github.com/PaddlePaddle/Paddle/pull/68903)
-- 修复组合算子相关 Bug。 [#69479](https://github.com/PaddlePaddle/Paddle/pull/69479), [#69487](https://github.com/PaddlePaddle/Paddle/pull/69487), [#67176](https://github.com/PaddlePaddle/Paddle/pull/67176)
-- 修复 CPU 上的 FP8 数据类型问题。 [#65539](https://github.com/PaddlePaddle/Paddle/pull/65539)
-- 去除计算流下不必要的创建 event 的开销 。 [#67315](https://github.com/PaddlePaddle/Paddle/pull/67247)
-- 修复性能问题。 [#68378](https://github.com/PaddlePaddle/Paddle/pull/68378)
-- 修复类型相关问题。 [#69720](https://github.com/PaddlePaddle/Paddle/pull/69720)
-- 修复其他问题。 [#70019](https://github.com/PaddlePaddle/Paddle/pull/70019), [#70008](https://github.com/PaddlePaddle/Paddle/pull/70008), [#70645](https://github.com/PaddlePaddle/Paddle/pull/70645), [#71209](https://github.com/PaddlePaddle/Paddle/pull/71209), [#68152](https://github.com/PaddlePaddle/Paddle/pull/68152), [#69907](https://github.com/PaddlePaddle/Paddle/pull/69907), [#71207](https://github.com/PaddlePaddle/Paddle/pull/71207)
-
-### 性能优化
-
-- CINN 编译器相关优化。 [#69455](https://github.com/PaddlePaddle/Paddle/pull/69455), [#70284](https://github.com/PaddlePaddle/Paddle/pull/70284), [#67576](https://github.com/PaddlePaddle/Paddle/pull/67576), [#68946](https://github.com/PaddlePaddle/Paddle/pull/68946), [#68615](https://github.com/PaddlePaddle/Paddle/pull/68615)
-- oneDNN 相关优化。 [#68784](https://github.com/PaddlePaddle/Paddle/pull/68784), [#68716](https://github.com/PaddlePaddle/Paddle/pull/68716), [#67554](https://github.com/PaddlePaddle/Paddle/pull/67554)
-- 内存相关优化。 [#68660](https://github.com/PaddlePaddle/Paddle/pull/68571), [#69930](https://github.com/PaddlePaddle/Paddle/pull/69930), [#68174](https://github.com/PaddlePaddle/Paddle/pull/68174), [#68660](https://github.com/PaddlePaddle/Paddle/pull/68571), [#70359](https://github.com/PaddlePaddle/Paddle/pull/70359)
-- kernel 计算相关优化。 [#65507](https://github.com/PaddlePaddle/Paddle/pull/65507), [#68541](https://github.com/PaddlePaddle/Paddle/pull/68541), [#71479](https://github.com/PaddlePaddle/Paddle/pull/71479), [#71403](https://github.com/PaddlePaddle/Paddle/pull/71403)
-- XPU 相关优化。 [#67051](https://github.com/PaddlePaddle/Paddle/pull/67051)
-- 其他优化例如推理过程的 pass 优化、动态 shape 在自动并行的优化及 FlashAttention 计算优化等。 [#68394](https://github.com/PaddlePaddle/Paddle/pull/68394), [#68696](https://github.com/PaddlePaddle/Paddle/pull/68696), [#68759](https://github.com/PaddlePaddle/Paddle/pull/68759), [#68791](https://github.com/PaddlePaddle/Paddle/pull/68791), [#69390](https://github.com/PaddlePaddle/Paddle/pull/69390), [#69961](https://github.com/PaddlePaddle/Paddle/pull/69961), [#69939](https://github.com/PaddlePaddle/Paddle/pull/69939), [#70455](https://github.com/PaddlePaddle/Paddle/pull/70455), [#70663](https://github.com/PaddlePaddle/Paddle/pull/70663), [#71290](https://github.com/PaddlePaddle/Paddle/pull/71123)
-
-### 其他
-
-- 修改函数命名空间。 [#66818](https://github.com/PaddlePaddle/Paddle/pull/66818), [#67023](https://github.com/PaddlePaddle/Paddle/pull/67023), [#67114](https://github.com/PaddlePaddle/Paddle/pull/67114), [#67217](https://github.com/PaddlePaddle/Paddle/pull/67217), [#67524](https://github.com/PaddlePaddle/Paddle/pull/67524), [#67796](https://github.com/PaddlePaddle/Paddle/pull/67796), [#67881](https://github.com/PaddlePaddle/Paddle/pull/67881)
-- 升级 OneDNN。 [#69917](https://github.com/PaddlePaddle/Paddle/pull/69917)
-- 修改 pass 等级。 [#69524](https://github.com/PaddlePaddle/Paddle/pull/69524)
-- 内存读写相关优化。 [#65804](https://github.com/PaddlePaddle/Paddle/pull/65804), [#66923](https://github.com/PaddlePaddle/Paddle/pull/66923)
-- 优化 GetValueName 相关签名。 [#66363](https://github.com/PaddlePaddle/Paddle/pull/66363), [#66559](https://github.com/PaddlePaddle/Paddle/pull/66559), [#66738](https://github.com/PaddlePaddle/Paddle/pull/66738)
+- 梯度与自动微分稳定性提升:修复部分反向算子梯度计算错误,增强自动微分场景下的数值稳定性与功能正确性。 [#71716](https://github.com/PaddlePaddle/Paddle/pull/71716), [#72299](https://github.com/PaddlePaddle/Paddle/pull/72299), [#72358](https://github.com/PaddlePaddle/Paddle/pull/72358), [#73037](https://github.com/PaddlePaddle/Paddle/pull/73037), [#73140](https://github.com/PaddlePaddle/Paddle/pull/73140), [#73185](https://github.com/PaddlePaddle/Paddle/pull/73185)
+- 数值精度与溢出防护:解决数值溢出、精度损失及大 tensor 溢出问题,保障低精度计算与大张量操作的可靠性。 [#72584](https://github.com/PaddlePaddle/Paddle/pull/72584), [#72608](https://github.com/PaddlePaddle/Paddle/pull/72608), [#72681](https://github.com/PaddlePaddle/Paddle/pull/72681), [#72639](https://github.com/PaddlePaddle/Paddle/pull/72639), [#73245](https://github.com/PaddlePaddle/Paddle/pull/73245), [#73359](https://github.com/PaddlePaddle/Paddle/pull/73359), [#72456](https://github.com/PaddlePaddle/Paddle/pull/72456)
+- 算子逻辑与框架对齐:对齐算子运算逻辑,修复部分算子输入异常等问题,其他重要修复:添加检查,保障框架功能正确性。 [#72282](https://github.com/PaddlePaddle/Paddle/pull/72282), [#71863](https://github.com/PaddlePaddle/Paddle/pull/71863), [#72650](https://github.com/PaddlePaddle/Paddle/pull/72650), [#72843](https://github.com/PaddlePaddle/Paddle/pull/72843), [#73070](https://github.com/PaddlePaddle/Paddle/pull/73070), [#73141](https://github.com/PaddlePaddle/Paddle/pull/73141), [#73203](https://github.com/PaddlePaddle/Paddle/pull/73203), [#73350](https://github.com/PaddlePaddle/Paddle/pull/73350), [#73440](https://github.com/PaddlePaddle/Paddle/pull/73440), [#73539](https://github.com/PaddlePaddle/Paddle/pull/73539), [#73339](https://github.com/PaddlePaddle/Paddle/pull/73339)
+- CUDA 内核与硬件适配优化:支持 NVIDIA SM90 架构,修复溢出等问题,移除冗余 CUDA 错误检查,提升 GPU 计算效率与新硬件适配性。 [#72507](https://github.com/PaddlePaddle/Paddle/pull/72507), [#72849](https://github.com/PaddlePaddle/Paddle/pull/72849), [#72959](https://github.com/PaddlePaddle/Paddle/pull/72959), [#73130](https://github.com/PaddlePaddle/Paddle/pull/73130), [#73489](https://github.com/PaddlePaddle/Paddle/pull/73489)
-### 废弃
-
-- 删除废弃文件、功能。 [#67514](https://github.com/PaddlePaddle/Paddle/pull/67514), [#67811](https://github.com/PaddlePaddle/Paddle/pull/67811), [#67911](https://github.com/PaddlePaddle/Paddle/pull/67911)
-
-## 7. 推理部署
-
-重点围绕**新一代中间表示(PIR)生态建设**与**大模型推理优化**两大核心方向, 主要突破包括:
-
-1. **PIR-TensorRT 深度融合**
+### 功能增强
- - 完成核心执行机制重构与代码优化,开发 50+算子转换器
- - 新增低精度支持(FP16/INT8)与 Generic Plugin 执行能力
- - 构建完整单测体系,支持模型加载/保存全流程
+- 新增 int64_t 版本的快速除法取模实现,提升大整数场景下的计算性能与数值稳定性, [#72530](https://github.com/PaddlePaddle/Paddle/pull/72530)
+- 优化带步长张量拷贝 kernel,改进非连续内存布局下的数据拷贝效率。 [#72662](https://github.com/PaddlePaddle/Paddle/pull/72662)
-2. **大模型推理性能飞跃**
+-统一动态图与静态图模式下量化 API 的使用方式,简化量化模型开发流程, [#73100](https://github.com/PaddlePaddle/Paddle/pull/73100)
- - 新增混合专家系统(MoE)全流程支持,覆盖 Hopper 架构优化
- - 支持 128K 超长序列处理,提升长文本推理能力
- - 实现 FP8/W8A8 等前沿量化方案,降低显存占用
-
-3. **基础架构全面升级**
-
- - OneDNN 升级至 3.6 版本,CPU 推理性能显著提升
- - 模型加载速度优化 40%+,支持 PIR 模型快速加载
- - 完善分布式推理支持,修复 allreduce 数据类型问题
+### 性能提升
-### 新增功能
+- 优化 gelu 算子分解性能,提升计算效率。 [#72812](https://github.com/PaddlePaddle/Paddle/pull/72812)
-- 支持基于飞桨新一代中间表示(PIR)的 Paddle-TensorRT
- - 核心基础执行机制功能开发及代码优化。[#64995](https://github.com/PaddlePaddle/Paddle/pull/64995),[#67054](https://github.com/PaddlePaddle/Paddle/pull/67054),[#67660](https://github.com/PaddlePaddle/Paddle/pull/67660),[#67755](https://github.com/PaddlePaddle/Paddle/pull/67755),[#70762](https://github.com/PaddlePaddle/Paddle/pull/70762),
- - 算子 Marker 及 Converter 开发。[#67753](https://github.com/PaddlePaddle/Paddle/pull/67753),[#67956](https://github.com/PaddlePaddle/Paddle/pull/67956),[#68084](https://github.com/PaddlePaddle/Paddle/pull/68084),[#67974](https://github.com/PaddlePaddle/Paddle/pull/67974),[#68395](https://github.com/PaddlePaddle/Paddle/pull/68395),[#68216](https://github.com/PaddlePaddle/Paddle/pull/68216),[#68529](https://github.com/PaddlePaddle/Paddle/pull/68529),[#68608](https://github.com/PaddlePaddle/Paddle/pull/68608), [#68663](https://github.com/PaddlePaddle/Paddle/pull/68663),[#68757](https://github.com/PaddlePaddle/Paddle/pull/68757),[#68614](https://github.com/PaddlePaddle/Paddle/pull/68614),[#68783](https://github.com/PaddlePaddle/Paddle/pull/68783),[#68775](https://github.com/PaddlePaddle/Paddle/pull/68775),[#68839](https://github.com/PaddlePaddle/Paddle/pull/68839),[#68686](https://github.com/PaddlePaddle/Paddle/pull/68686),[#68840](https://github.com/PaddlePaddle/Paddle/pull/68840),[#68941](https://github.com/PaddlePaddle/Paddle/pull/68941),[#69015](https://github.com/PaddlePaddle/Paddle/pull/69015),[#69038](https://github.com/PaddlePaddle/Paddle/pull/69038),[#69117](https://github.com/PaddlePaddle/Paddle/pull/69117),[#69208](https://github.com/PaddlePaddle/Paddle/pull/69208),[#69315](https://github.com/PaddlePaddle/Paddle/pull/69315),[#69261](https://github.com/PaddlePaddle/Paddle/pull/69261),[#68878](https://github.com/PaddlePaddle/Paddle/pull/68878),[#69705](https://github.com/PaddlePaddle/Paddle/pull/69705),[#69706](https://github.com/PaddlePaddle/Paddle/pull/69706),[#70170](https://github.com/PaddlePaddle/Paddle/pull/70170),[#70267](https://github.com/PaddlePaddle/Paddle/pull/70267),[#70429](https://github.com/PaddlePaddle/Paddle/pull/70429),[#69330](https://github.com/PaddlePaddle/Paddle/pull/69330),[#70507](https://github.com/PaddlePaddle/Paddle/pull/70507),[#70535](https://github.com/PaddlePaddle/Paddle/pull/70535),[#70667](https://github.com/PaddlePaddle/Paddle/pull/70667),[#70816](https://github.com/PaddlePaddle/Paddle/pull/70816),[#70826](https://github.com/PaddlePaddle/Paddle/pull/70826),[#70955](https://github.com/PaddlePaddle/Paddle/pull/70955),[#71028](https://github.com/PaddlePaddle/Paddle/pull/71028),[#71013](https://github.com/PaddlePaddle/Paddle/pull/71013),[#71157](https://github.com/PaddlePaddle/Paddle/pull/71157),[#71231](https://github.com/PaddlePaddle/Paddle/pull/71231),[#69199](https://github.com/PaddlePaddle/Paddle/pull/69199),[#68956](https://github.com/PaddlePaddle/Paddle/pull/68956),[#66658](https://github.com/PaddlePaddle/Paddle/pull/66658),[#66811](https://github.com/PaddlePaddle/Paddle/pull/66811),[#67519](https://github.com/PaddlePaddle/Paddle/pull/67519),[#67877](https://github.com/PaddlePaddle/Paddle/pull/67877),[#68090](https://github.com/PaddlePaddle/Paddle/pull/68090),[#69086](https://github.com/PaddlePaddle/Paddle/pull/69086),[#68787](https://github.com/PaddlePaddle/Paddle/pull/68787),[#68778](https://github.com/PaddlePaddle/Paddle/pull/68778),[#69318](https://github.com/PaddlePaddle/Paddle/pull/69318),[#69995](https://github.com/PaddlePaddle/Paddle/pull/69995),[#70325](https://github.com/PaddlePaddle/Paddle/pull/70325),[#70817](https://github.com/PaddlePaddle/Paddle/pull/70817),[#70879](https://github.com/PaddlePaddle/Paddle/pull/70879),[#70875](https://github.com/PaddlePaddle/Paddle/pull/70875),[#71041](https://github.com/PaddlePaddle/Paddle/pull/71041),[#68876](https://github.com/PaddlePaddle/Paddle/pull/68876)
- - Generic Plugin 执行功能支持。[#66634](https://github.com/PaddlePaddle/Paddle/pull/66634),[#70251](https://github.com/PaddlePaddle/Paddle/pull/70251)
- - 低精度(FP16,INT8)功能支持。[#69597](https://github.com/PaddlePaddle/Paddle/pull/69597),[#71127](https://github.com/PaddlePaddle/Paddle/pull/71127),
- - 单测体系、pass 使用支持等辅助功能完善[#67525](https://github.com/PaddlePaddle/Paddle/pull/67525),[#68034](https://github.com/PaddlePaddle/Paddle/pull/68034),[#71281](https://github.com/PaddlePaddle/Paddle/pull/71281),[#71235](https://github.com/PaddlePaddle/Paddle/pull/71235),[#67568](https://github.com/PaddlePaddle/Paddle/pull/67568),[#70139](https://github.com/PaddlePaddle/Paddle/pull/70139),[#70529](https://github.com/PaddlePaddle/Paddle/pull/70529)
-- 大模型推理优化
- - 新增 fused_moe 功能支持(基础支持/非规范 TopK/Hopper 架构)[#66084](https://github.com/PaddlePaddle/Paddle/pull/66084), [#67425](https://github.com/PaddlePaddle/Paddle/pull/67425), [#67732](https://github.com/PaddlePaddle/Paddle/pull/67732)
- - 支持混合精度计算(GQA 混合精度/BF16 注册)[#65078](https://github.com/PaddlePaddle/Paddle/pull/65078), [#67769](https://github.com/PaddlePaddle/Paddle/pull/67769)
- - 新增推理优化功能(动态图推理/128K 长序列支持)[#65962](https://github.com/PaddlePaddle/Paddle/pull/65962), [#70088](https://github.com/PaddlePaddle/Paddle/pull/70088)
- - 新增量化推理算子实现(FP8 W8A8 计算/weight only int4 量化)[#65441](https://github.com/PaddlePaddle/Paddle/pull/65441), [#64094](https://github.com/PaddlePaddle/Paddle/pull/64094)
+### 其他
-### 功能完善
+- fluid 算子规范化与退场, [#71789](https://github.com/PaddlePaddle/Paddle/pull/71789), [#71818](https://github.com/PaddlePaddle/Paddle/pull/71818), [#71808](https://github.com/PaddlePaddle/Paddle/pull/71808), [#71860](https://github.com/PaddlePaddle/Paddle/pull/71860), [#71806](https://github.com/PaddlePaddle/Paddle/pull/71806), [#72011](https://github.com/PaddlePaddle/Paddle/pull/72011), [#72043](https://github.com/PaddlePaddle/Paddle/pull/72043), [#72034](https://github.com/PaddlePaddle/Paddle/pull/72034), [#72047](https://github.com/PaddlePaddle/Paddle/pull/72047), [#72056](https://github.com/PaddlePaddle/Paddle/pull/72056), [#72087](https://github.com/PaddlePaddle/Paddle/pull/72087), [#72086](https://github.com/PaddlePaddle/Paddle/pull/72086), [#72083](https://github.com/PaddlePaddle/Paddle/pull/72083), [#72079](https://github.com/PaddlePaddle/Paddle/pull/72079), [#72078](https://github.com/PaddlePaddle/Paddle/pull/72078), [#72076](https://github.com/PaddlePaddle/Paddle/pull/72076), [#72057](https://github.com/PaddlePaddle/Paddle/pull/72057), [#72077](https://github.com/PaddlePaddle/Paddle/pull/72077), [#72096](https://github.com/PaddlePaddle/Paddle/pull/72096), [#72085](https://github.com/PaddlePaddle/Paddle/pull/72085), [#72092](https://github.com/PaddlePaddle/Paddle/pull/72092), [#72110](https://github.com/PaddlePaddle/Paddle/pull/72110), [#72127](https://github.com/PaddlePaddle/Paddle/pull/72127), [#72111](https://github.com/PaddlePaddle/Paddle/pull/72111), [#72126](https://github.com/PaddlePaddle/Paddle/pull/72126), [#72135](https://github.com/PaddlePaddle/Paddle/pull/72135), [#72112](https://github.com/PaddlePaddle/Paddle/pull/72112), [#72131](https://github.com/PaddlePaddle/Paddle/pull/72131), [#70358](https://github.com/PaddlePaddle/Paddle/pull/70358), [#72125](https://github.com/PaddlePaddle/Paddle/pull/72125), [#72171](https://github.com/PaddlePaddle/Paddle/pull/72171), [#72160](https://github.com/PaddlePaddle/Paddle/pull/72160), [#72188](https://github.com/PaddlePaddle/Paddle/pull/72188), [#72197](https://github.com/PaddlePaddle/Paddle/pull/72197), [#72212](https://github.com/PaddlePaddle/Paddle/pull/72212), [#72211](https://github.com/PaddlePaddle/Paddle/pull/72211), [#72184](https://github.com/PaddlePaddle/Paddle/pull/72184), [#71897](https://github.com/PaddlePaddle/Paddle/pull/71897), [#72219](https://github.com/PaddlePaddle/Paddle/pull/72219), [#72218](https://github.com/PaddlePaddle/Paddle/pull/72218), [#72074](https://github.com/PaddlePaddle/Paddle/pull/72074), [#70330](https://github.com/PaddlePaddle/Paddle/pull/70330), [#70274](https://github.com/PaddlePaddle/Paddle/pull/70274), [#72295](https://github.com/PaddlePaddle/Paddle/pull/72295), [#72220](https://github.com/PaddlePaddle/Paddle/pull/72220), [#72343](https://github.com/PaddlePaddle/Paddle/pull/72343), [#72303](https://github.com/PaddlePaddle/Paddle/pull/72303), [#72296](https://github.com/PaddlePaddle/Paddle/pull/72296), [#72338](https://github.com/PaddlePaddle/Paddle/pull/72338), [#70001](https://github.com/PaddlePaddle/Paddle/pull/70001), [#70348](https://github.com/PaddlePaddle/Paddle/pull/70348), [#70329](https://github.com/PaddlePaddle/Paddle/pull/70329)
-- Inference 在 PIR 下功能机制完善
- - 执行器支持加载.json 模型[#65223](https://github.com/PaddlePaddle/Paddle/pull/65223)
- - 支持可控制开启 PIR 模式开关[#65596](https://github.com/PaddlePaddle/Paddle/pull/65596)
-- 大模型推理机制完善
- - 优化 gemm 算法搜索(cublaslt 全局搜索/离线缓存)[#65597](https://github.com/PaddlePaddle/Paddle/pull/65597), [#66132](https://github.com/PaddlePaddle/Paddle/pull/66132)
- - 增强类型系统兼容性(PD_VISIT_FLOATING_AND_HALF_TYPES)[#71022](https://github.com/PaddlePaddle/Paddle/pull/71022)
- - 优化注意力机制(多块 MMHA/XPU 支持)[#67211](https://github.com/PaddlePaddle/Paddle/pull/67211), [#68104](https://github.com/PaddlePaddle/Paddle/pull/68104)
+## 6. 框架性能优化
-### 性能优化
+### 新特性
-- OneDNN 升级到 3.6 版本(在 GNR/EMR 设备上模型推理性能获得普遍提升)[#69386](https://github.com/PaddlePaddle/Paddle/pull/69386)
-- 算子性能优化(layer_norm/top_p_sampling)[#65711](https://github.com/PaddlePaddle/Paddle/pull/65711)
-- 模型加载加速(常规/PIR 模型)[#69110](https://github.com/PaddlePaddle/Paddle/pull/69110), [#70219](https://github.com/PaddlePaddle/Paddle/pull/70219)
+支持`sharding_overlap`的`acc_steps`可配置。 [#72395](https://github.com/PaddlePaddle/Paddle/pull/72395)
### Bug 修复
-- 修复 Predictor 在保存/加载 PIR 模型时有关问题。 [#65180](https://github.com/PaddlePaddle/Paddle/pull/65180),[#65019](https://github.com/PaddlePaddle/Paddle/pull/65019),[#65714](https://github.com/PaddlePaddle/Paddle/pull/65714),[#69619](https://github.com/PaddlePaddle/Paddle/pull/69619),[#67570](https://github.com/PaddlePaddle/Paddle/pull/67570),[#65595](https://github.com/PaddlePaddle/Paddle/pull/65595),[#69200](https://github.com/PaddlePaddle/Paddle/pull/69200)
-- 修复推理单测在 PIR、多硬件等场景下的执行问题。[#65763](https://github.com/PaddlePaddle/Paddle/pull/65763),[#66481](https://github.com/PaddlePaddle/Paddle/pull/66481),[#67105](https://github.com/PaddlePaddle/Paddle/pull/67105),[#67248](https://github.com/PaddlePaddle/Paddle/pull/67248),[#67470](https://github.com/PaddlePaddle/Paddle/pull/67470),[#67638](https://github.com/PaddlePaddle/Paddle/pull/67638),[#68135](https://github.com/PaddlePaddle/Paddle/pull/68135),[#68191](https://github.com/PaddlePaddle/Paddle/pull/68191),[#68211](https://github.com/PaddlePaddle/Paddle/pull/68211),[#68160](https://github.com/PaddlePaddle/Paddle/pull/68160),[#68185](https://github.com/PaddlePaddle/Paddle/pull/68185),[#68127](https://github.com/PaddlePaddle/Paddle/pull/68127),[#68887](https://github.com/PaddlePaddle/Paddle/pull/68887),[#69191](https://github.com/PaddlePaddle/Paddle/pull/69191), [#70961](https://github.com/PaddlePaddle/Paddle/pull/70961),[#68020](https://github.com/PaddlePaddle/Paddle/pull/68020),[#67923](https://github.com/PaddlePaddle/Paddle/pull/67923),[#67963](https://github.com/PaddlePaddle/Paddle/pull/67963),[#68482](https://github.com/PaddlePaddle/Paddle/pull/68482),[#68546](https://github.com/PaddlePaddle/Paddle/pull/68546),[#68593](https://github.com/PaddlePaddle/Paddle/pull/68593),[#68793](https://github.com/PaddlePaddle/Paddle/pull/68793)
-- 修复 Paddle TensorRT 转换与执行相关问题。[#66932](https://github.com/PaddlePaddle/Paddle/pull/66932),[#66655](https://github.com/PaddlePaddle/Paddle/pull/66655),[#67274](https://github.com/PaddlePaddle/Paddle/pull/67274),[#67504](https://github.com/PaddlePaddle/Paddle/pull/67504),[#65780](https://github.com/PaddlePaddle/Paddle/pull/65780),[#68170](https://github.com/PaddlePaddle/Paddle/pull/68170),[#68647](https://github.com/PaddlePaddle/Paddle/pull/68647),[#68776](https://github.com/PaddlePaddle/Paddle/pull/68776),[#69573](https://github.com/PaddlePaddle/Paddle/pull/69573),[#69598](https://github.com/PaddlePaddle/Paddle/pull/69598),[#69510](https://github.com/PaddlePaddle/Paddle/pull/69510),[#69864](https://github.com/PaddlePaddle/Paddle/pull/69864),[#69885](https://github.com/PaddlePaddle/Paddle/pull/69885),[#70161](https://github.com/PaddlePaddle/Paddle/pull/70161),[#70116](https://github.com/PaddlePaddle/Paddle/pull/70116),[#70791](https://github.com/PaddlePaddle/Paddle/pull/70791),[#70801](https://github.com/PaddlePaddle/Paddle/pull/70801),[#70824](https://github.com/PaddlePaddle/Paddle/pull/70824),[#70939](https://github.com/PaddlePaddle/Paddle/pull/70939), [#71143](https://github.com/PaddlePaddle/Paddle/pull/71143),[#71154](https://github.com/PaddlePaddle/Paddle/pull/71154),[#71163](https://github.com/PaddlePaddle/Paddle/pull/71163),[#71183](https://github.com/PaddlePaddle/Paddle/pull/71183),[#71233](https://github.com/PaddlePaddle/Paddle/pull/71233),[#71287](https://github.com/PaddlePaddle/Paddle/pull/71287),[#71319](https://github.com/PaddlePaddle/Paddle/pull/71319),[#67720](https://github.com/PaddlePaddle/Paddle/pull/67720),[#69671](https://github.com/PaddlePaddle/Paddle/pull/69671),[#70168](https://github.com/PaddlePaddle/Paddle/pull/70168),[#69957](https://github.com/PaddlePaddle/Paddle/pull/69957)
-- Paddle Inference 编译链接相关问题修复。[#65846](https://github.com/PaddlePaddle/Paddle/pull/65846),[#67081](https://github.com/PaddlePaddle/Paddle/pull/67081),[#63184](https://github.com/PaddlePaddle/Paddle/pull/63184)
-- 量化问题修复。[#67839](https://github.com/PaddlePaddle/Paddle/pull/67839),[#68049](https://github.com/PaddlePaddle/Paddle/pull/68049),[#70099](https://github.com/PaddlePaddle/Paddle/pull/70099), [#64878](https://github.com/PaddlePaddle/Paddle/pull/64878),[#65717](https://github.com/PaddlePaddle/Paddle/pull/65717),[#67552](https://github.com/PaddlePaddle/Paddle/pull/67552),[#67715](https://github.com/PaddlePaddle/Paddle/pull/67715)
-- OneDNN 推理问题修复。[#67836](https://github.com/PaddlePaddle/Paddle/pull/67836),[#68021](https://github.com/PaddlePaddle/Paddle/pull/68021),[#68132](https://github.com/PaddlePaddle/Paddle/pull/68132),[#71426](https://github.com/PaddlePaddle/Paddle/pull/71426),[#68057](https://github.com/PaddlePaddle/Paddle/pull/68057)
-- 内存问题修复。[#68631](https://github.com/PaddlePaddle/Paddle/pull/68631),[#69129](https://github.com/PaddlePaddle/Paddle/pull/69129),[#70314](https://github.com/PaddlePaddle/Paddle/pull/70314),[#67863](https://github.com/PaddlePaddle/Paddle/pull/67863)
-- Paddle Inference 支持 OpenVINO 问题修复。[#70212](https://github.com/PaddlePaddle/Paddle/pull/70212),[#70288](https://github.com/PaddlePaddle/Paddle/pull/70288),
-- Pass 相关问题修复。[#65349](https://github.com/PaddlePaddle/Paddle/pull/65349),[#65421](https://github.com/PaddlePaddle/Paddle/pull/65421),[#65677](https://github.com/PaddlePaddle/Paddle/pull/65677),[#66850](https://github.com/PaddlePaddle/Paddle/pull/66850),[#67443](https://github.com/PaddlePaddle/Paddle/pull/67443),[#67620](https://github.com/PaddlePaddle/Paddle/pull/67620),[#68158](https://github.com/PaddlePaddle/Paddle/pull/68158),[#68642](https://github.com/PaddlePaddle/Paddle/pull/68642),[#68837](https://github.com/PaddlePaddle/Paddle/pull/68837),[#68880](https://github.com/PaddlePaddle/Paddle/pull/68880),[#68935](https://github.com/PaddlePaddle/Paddle/pull/68935),[#69112](https://github.com/PaddlePaddle/Paddle/pull/69112),[#69205](https://github.com/PaddlePaddle/Paddle/pull/69205),[#69242](https://github.com/PaddlePaddle/Paddle/pull/69242),[#69352](https://github.com/PaddlePaddle/Paddle/pull/69352),[#69421](https://github.com/PaddlePaddle/Paddle/pull/69421),[#69690](https://github.com/PaddlePaddle/Paddle/pull/69690),
-- 其他类问题修复。[#70237](https://github.com/PaddlePaddle/Paddle/pull/70237),[#68173](https://github.com/PaddlePaddle/Paddle/pull/68173)
-- 修复 fused_moe 相关问题(测试/GEMM/WINT4/多架构兼容性/Bias 可选)[#67353](https://github.com/PaddlePaddle/Paddle/pull/67353), [#67396](https://github.com/PaddlePaddle/Paddle/pull/67396), [#67717](https://github.com/PaddlePaddle/Paddle/pull/67717), [#67794](https://github.com/PaddlePaddle/Paddle/pull/67794), [#67783](https://github.com/PaddlePaddle/Paddle/pull/67783)
-- 修复 block_attention 系列问题(GQA 差异/越界风险/多头支持)[#67175](https://github.com/PaddlePaddle/Paddle/pull/67175), [#69001](https://github.com/PaddlePaddle/Paddle/pull/69001), [#70763](https://github.com/PaddlePaddle/Paddle/pull/70763)
-- 修复 PIR 相关问题(布局转换/BF16 替换错误)[#66977](https://github.com/PaddlePaddle/Paddle/pull/66977), [#67830](https://github.com/PaddlePaddle/Paddle/pull/67830)
-- 修复分布式相关(allreduce 数据类型/参数同步)[#67449](https://github.com/PaddlePaddle/Paddle/pull/67449), [#69157](https://github.com/PaddlePaddle/Paddle/pull/69157)
-- 修复内核执行问题(前向反向冲突/默认流 argsort)[#67218](https://github.com/PaddlePaddle/Paddle/pull/67218), [#68374](https://github.com/PaddlePaddle/Paddle/pull/68374)
-- 其他关键修复(减小 C++库体积/修复 NeoX 格式下的 RoPE 计算/修复静态图执行)[#66041](https://github.com/PaddlePaddle/Paddle/pull/66041), [#66583](https://github.com/PaddlePaddle/Paddle/pull/66583), [#67580](https://github.com/PaddlePaddle/Paddle/pull/67580)
-
-### 其他修改
-
-- 代码清理与维护(API 弃用/编译警告修复)[#68048](https://github.com/PaddlePaddle/Paddle/pull/68048), [#70384](https://github.com/PaddlePaddle/Paddle/pull/70384)
-- 第三方集成优化(OpenVINO 子模块管理)[#70313](https://github.com/PaddlePaddle/Paddle/pull/70313), [#70425](https://github.com/PaddlePaddle/Paddle/pull/70425)
-
-## 8. 硬件适配
+- 修复算子`c_softmax_with_cross_entropy_grad`的`inplace`问题。 [#72366](https://github.com/PaddlePaddle/Paddle/pull/72366)
-针对昆仑、海光等平台持续进行功能完善和升级,提升用户体验
+### 功能增强
-### 新功能
-
-昆仑芯 XPU 上进行 OP 的添加和功能的完善,涉及的 ops 包括:flash attention/flash_attn_unpadded、multinomial、matmul、repeat_interleave、logsumexp、index_put_grad、mean_grad、pow、pow_grad、rsqrt、full、rms_norm、rms_norm_grad、put_along_axis、Cumsum、argmin、masked_select/grad、expand_v2/grad、all2all、expand、reduce_sum、reduce_max、reduce_min、moe、fused_linear_param_grad_add、adamw、clip/clip_grad、tan、acos、blha_get_max_len、gather/gather_grad、scatter/scatter_grad、round、index_select/sindex_select_grad、isfinite、isinf、quantize_linear、dequantize_linear、conv3d_transpose、logsumexp_grad、index_add_grad、eye、gather_element、tril、triu、set_value_grad、argmax、take_along_axis 等
-[#65413](https://github.com/PaddlePaddle/Paddle/pull/65413), [#64846](https://github.com/PaddlePaddle/Paddle/pull/64846), [#65656](https://github.com/PaddlePaddle/Paddle/pull/65656), [#65963](https://github.com/PaddlePaddle/Paddle/pull/65963), [#66143](https://github.com/PaddlePaddle/Paddle/pull/66143), [#66482](https://github.com/PaddlePaddle/Paddle/pull/66482), [#66585](https://github.com/PaddlePaddle/Paddle/pull/66585), [#67077](https://github.com/PaddlePaddle/Paddle/pull/67077), [#67173](https://github.com/PaddlePaddle/Paddle/pull/67173), [#67551](https://github.com/PaddlePaddle/Paddle/pull/67551), [#63989](https://github.com/PaddlePaddle/Paddle/pull/63989), [#67919](https://github.com/PaddlePaddle/Paddle/pull/67919), [#68052](https://github.com/PaddlePaddle/Paddle/pull/68052), [#68176](https://github.com/PaddlePaddle/Paddle/pull/68176), [#68408](https://github.com/PaddlePaddle/Paddle/pull/68408), [#68454](https://github.com/PaddlePaddle/Paddle/pull/68454), [#68478](https://github.com/PaddlePaddle/Paddle/pull/68478), [#68473](https://github.com/PaddlePaddle/Paddle/pull/68473), [#68453](https://github.com/PaddlePaddle/Paddle/pull/68453), [#68770](https://github.com/PaddlePaddle/Paddle/pull/68770), [#68933](https://github.com/PaddlePaddle/Paddle/pull/68933), [#69042](https://github.com/PaddlePaddle/Paddle/pull/69042), [#68713](https://github.com/PaddlePaddle/Paddle/pull/68713), [#69368](https://github.com/PaddlePaddle/Paddle/pull/69368), [#69723](https://github.com/PaddlePaddle/Paddle/pull/69723), [#69767](https://github.com/PaddlePaddle/Paddle/pull/69767), [#69898](https://github.com/PaddlePaddle/Paddle/pull/69898), [#69970](https://github.com/PaddlePaddle/Paddle/pull/69970), [#69771](https://github.com/PaddlePaddle/Paddle/pull/69771), [#70176](https://github.com/PaddlePaddle/Paddle/pull/70176), [#70428](https://github.com/PaddlePaddle/Paddle/pull/70428), [#70573](https://github.com/PaddlePaddle/Paddle/pull/70573), [#70576](https://github.com/PaddlePaddle/Paddle/pull/70576), [#70633](https://github.com/PaddlePaddle/Paddle/pull/70633), [#70114](https://github.com/PaddlePaddle/Paddle/pull/70114), [#70627](https://github.com/PaddlePaddle/Paddle/pull/70627), [#71038](https://github.com/PaddlePaddle/Paddle/pull/71038), [#71132](https://github.com/PaddlePaddle/Paddle/pull/71132), [#71228](https://github.com/PaddlePaddle/Paddle/pull/71228), [#71274](https://github.com/PaddlePaddle/Paddle/pull/71274), [#71364](https://github.com/PaddlePaddle/Paddle/pull/71364), [#71375](https://github.com/PaddlePaddle/Paddle/pull/71375), [#71431](https://github.com/PaddlePaddle/Paddle/pull/71431), [#71451](https://github.com/PaddlePaddle/Paddle/pull/71451), [#67585](https://github.com/PaddlePaddle/Paddle/pull/67585), [#67637](https://github.com/PaddlePaddle/Paddle/pull/67637), [#67914](https://github.com/PaddlePaddle/Paddle/pull/67914), [#67641](https://github.com/PaddlePaddle/Paddle/pull/67641), [#67913](https://github.com/PaddlePaddle/Paddle/pull/67913), [#67955](https://github.com/PaddlePaddle/Paddle/pull/67955), [#68411](https://github.com/PaddlePaddle/Paddle/pull/68411), [#68560](https://github.com/PaddlePaddle/Paddle/pull/68560), [#68423](https://github.com/PaddlePaddle/Paddle/pull/68423), [#68894](https://github.com/PaddlePaddle/Paddle/pull/68894), [#71053](https://github.com/PaddlePaddle/Paddle/pull/71053), [#71047](https://github.com/PaddlePaddle/Paddle/pull/71047), [#69056](https://github.com/PaddlePaddle/Paddle/pull/69056), [#70843](https://github.com/PaddlePaddle/Paddle/pull/70843), [#65653](https://github.com/PaddlePaddle/Paddle/pull/65653), [#68023](https://github.com/PaddlePaddle/Paddle/pull/68023), [#67780](https://github.com/PaddlePaddle/Paddle/pull/67780), [#68622](https://github.com/PaddlePaddle/Paddle/pull/68622), [#67215](https://github.com/PaddlePaddle/Paddle/pull/67215)
-
-海光 DCU 上添加 rocsolver、warpctc 的支持,并进行 OP 的添加和功能的完善,涉及的 ops 包括:flash_attention、hipblaslt、fastgelu、multiclass_nms3
-
-[#68066](https://github.com/PaddlePaddle/Paddle/pull/68066), [#69457](https://github.com/PaddlePaddle/Paddle/pull/69457), [#68603](https://github.com/PaddlePaddle/Paddle/pull/68603), [#65599](https://github.com/PaddlePaddle/Paddle/pull/65599), [#70587](https://github.com/PaddlePaddle/Paddle/pull/70587), [#71337](https://github.com/PaddlePaddle/Paddle/pull/71337), [#70173](https://github.com/PaddlePaddle/Paddle/pull/70173)
-
-### Bug 修复
+- 性能优化与加速:启用深度卷积的 cuDNN 支持,提升卷积运算效率。更新池化操作策略并优化 permute 内存操作,减少 CUDA 内存占用。优化打印速度,加速调试与日志输出流程。 [#71796](https://github.com/PaddlePaddle/Paddle/pull/71796), [#73442](https://github.com/PaddlePaddle/Paddle/pull/73442), [#73563](https://github.com/PaddlePaddle/Paddle/pull/73563)
+- 功能增强与操作支持:新增 masked_fill 操作及布尔索引优化,增强张量掩码处理能力。实现 index_elementwise 操作,支持基于索引的元素级运算。添加池化与 reshape 执行策略,提升模型操作的灵活性。 [#72788](https://github.com/PaddlePaddle/Paddle/pull/72788), [#72942](https://github.com/PaddlePaddle/Paddle/pull/72942)
+- 错误修复与稳定性提升:修复 fused_rms_norm 在 SPMD 并行模式下的部分状态支持问题。修正 slice 操作中输出维度计算及 IndexGetStride 的索引错误,确保计算正确性。 [#72118](https://github.com/PaddlePaddle/Paddle/pull/72118), [#72223](https://github.com/PaddlePaddle/Paddle/pull/72223), [#73184](https://github.com/PaddlePaddle/Paddle/pull/73184), [#73237](https://github.com/PaddlePaddle/Paddle/pull/73237), [#73054](https://github.com/PaddlePaddle/Paddle/pull/73054)
-昆仑芯 XPU 上进行 OP 的 Bug 修复
-[#65020](https://github.com/PaddlePaddle/Paddle/pull/65020), [#65251](https://github.com/PaddlePaddle/Paddle/pull/65251), [#65418](https://github.com/PaddlePaddle/Paddle/pull/65418), [#65387](https://github.com/PaddlePaddle/Paddle/pull/65387), [#65525](https://github.com/PaddlePaddle/Paddle/pull/65525), [#65613](https://github.com/PaddlePaddle/Paddle/pull/65613), [#65533](https://github.com/PaddlePaddle/Paddle/pull/65533), [#65705](https://github.com/PaddlePaddle/Paddle/pull/65705), [#65915](https://github.com/PaddlePaddle/Paddle/pull/65915), [#66238](https://github.com/PaddlePaddle/Paddle/pull/66238), [#66485](https://github.com/PaddlePaddle/Paddle/pull/66485), [#67349](https://github.com/PaddlePaddle/Paddle/pull/67349), [#67372](https://github.com/PaddlePaddle/Paddle/pull/67372), [#67276](https://github.com/PaddlePaddle/Paddle/pull/67276), [#67460](https://github.com/PaddlePaddle/Paddle/pull/67460), [#67496](https://github.com/PaddlePaddle/Paddle/pull/67496), [#67530](https://github.com/PaddlePaddle/Paddle/pull/67530), [#67828](https://github.com/PaddlePaddle/Paddle/pull/67828), [#68010](https://github.com/PaddlePaddle/Paddle/pull/68010), [#68157](https://github.com/PaddlePaddle/Paddle/pull/68157), [#68172](https://github.com/PaddlePaddle/Paddle/pull/68172), [#68388](https://github.com/PaddlePaddle/Paddle/pull/68388), [#68213](https://github.com/PaddlePaddle/Paddle/pull/68213), [#68501](https://github.com/PaddlePaddle/Paddle/pull/68501), [#68504](https://github.com/PaddlePaddle/Paddle/pull/68504), [#68585](https://github.com/PaddlePaddle/Paddle/pull/68585), [#69229](https://github.com/PaddlePaddle/Paddle/pull/69229), [#69374](https://github.com/PaddlePaddle/Paddle/pull/69374), [#69424](https://github.com/PaddlePaddle/Paddle/pull/69424), [#69440](https://github.com/PaddlePaddle/Paddle/pull/69440), [#69614](https://github.com/PaddlePaddle/Paddle/pull/69614), [#68542](https://github.com/PaddlePaddle/Paddle/pull/68542), [#69990](https://github.com/PaddlePaddle/Paddle/pull/69990), [#70351](https://github.com/PaddlePaddle/Paddle/pull/70351), [#70479](https://github.com/PaddlePaddle/Paddle/pull/70479), [#70431](https://github.com/PaddlePaddle/Paddle/pull/70431), [#70638](https://github.com/PaddlePaddle/Paddle/pull/70638), [#70856](https://github.com/PaddlePaddle/Paddle/pull/70856), [#70974](https://github.com/PaddlePaddle/Paddle/pull/70974), [#70973](https://github.com/PaddlePaddle/Paddle/pull/70973), [#71027](https://github.com/PaddlePaddle/Paddle/pull/71027), [#71062](https://github.com/PaddlePaddle/Paddle/pull/71062), [#71115](https://github.com/PaddlePaddle/Paddle/pull/71115), [#71110](https://github.com/PaddlePaddle/Paddle/pull/71110), [#70858](https://github.com/PaddlePaddle/Paddle/pull/70858), [#71147](https://github.com/PaddlePaddle/Paddle/pull/71147), [#71212](https://github.com/PaddlePaddle/Paddle/pull/71212), [#71361](https://github.com/PaddlePaddle/Paddle/pull/71361), [#71423](https://github.com/PaddlePaddle/Paddle/pull/71423), [#70859](https://github.com/PaddlePaddle/Paddle/pull/70859), [#71492](https://github.com/PaddlePaddle/Paddle/pull/71492), [#71493](https://github.com/PaddlePaddle/Paddle/pull/71493), [#69826](https://github.com/PaddlePaddle/Paddle/pull/69826), [#67341](https://github.com/PaddlePaddle/Paddle/pull/67341), [#68906](https://github.com/PaddlePaddle/Paddle/pull/68906), [#71171](https://github.com/PaddlePaddle/Paddle/pull/71171)
+### 性能提升
-海光 DCU 上进行 OP 的 Bug 修复
-[#69617](https://github.com/PaddlePaddle/Paddle/pull/69617), [#65716](https://github.com/PaddlePaddle/Paddle/pull/65716), [#66630](https://github.com/PaddlePaddle/Paddle/pull/66630), [#65399](https://github.com/PaddlePaddle/Paddle/pull/65399)
+- Faster Guard 适配:减少 SOT 端到端链路开销。 [#71900](https://github.com/PaddlePaddle/Paddle/pull/71900), [#71979](https://github.com/PaddlePaddle/Paddle/pull/71979), [#72081](https://github.com/PaddlePaddle/Paddle/pull/72081), [#72327](https://github.com/PaddlePaddle/Paddle/pull/72327), [#72564](https://github.com/PaddlePaddle/Paddle/pull/72564), [#72823](https://github.com/PaddlePaddle/Paddle/pull/72823)
+- 性能优化与加速:优化算子调度策略。升级 Flash Attention 至 v3 版本,减少计算开销。修复模型性能瓶颈,提升推理与训练速度。 [#71937](https://github.com/PaddlePaddle/Paddle/pull/71937), [#71828](https://github.com/PaddlePaddle/Paddle/pull/71828), [#71461](https://github.com/PaddlePaddle/Paddle/pull/71461), [#72039](https://github.com/PaddlePaddle/Paddle/pull/72039), [#72228](https://github.com/PaddlePaddle/Paddle/pull/72228), [#72225](https://github.com/PaddlePaddle/Paddle/pull/72225), [#72623](https://github.com/PaddlePaddle/Paddle/pull/72623), [#72666](https://github.com/PaddlePaddle/Paddle/pull/72666), [#73147](https://github.com/PaddlePaddle/Paddle/pull/73147), [#73393](https://github.com/PaddlePaddle/Paddle/pull/73393)
+- 并行计算:优化自动并行中的网格重分片策略,实现 Sharding Stage 的通信融合并优化逻辑,提升分布式训练稳定性,降低分布式训练通信开销。 [#71969](https://github.com/PaddlePaddle/Paddle/pull/71969), [#72120](https://github.com/PaddlePaddle/Paddle/pull/72120), [#73279](https://github.com/PaddlePaddle/Paddle/pull/73279), [#73406](https://github.com/PaddlePaddle/Paddle/pull/73406)
-### 性能优化
+功能增强与修复:- 优化算子索引和内核调度逻辑。 [#72625](https://github.com/PaddlePaddle/Paddle/pull/72625), [#72741](https://github.com/PaddlePaddle/Paddle/pull/72741), [#73082](https://github.com/PaddlePaddle/Paddle/pull/73082), [#73501](https://github.com/PaddlePaddle/Paddle/pull/73501)
-昆仑芯 XPU 对 stream 等基础组件功能升级、对部分 op 的性能进行优化。
-[#65102](https://github.com/PaddlePaddle/Paddle/pull/65102), [#69727](https://github.com/PaddlePaddle/Paddle/pull/69727), [#69899](https://github.com/PaddlePaddle/Paddle/pull/69899), [#69942](https://github.com/PaddlePaddle/Paddle/pull/69942), [#70025](https://github.com/PaddlePaddle/Paddle/pull/70025), [#70640](https://github.com/PaddlePaddle/Paddle/pull/70640)
+- 模型与操作支持:支持 NHWC 格式的深度卷积,适配更多硬件内存布局。 [#72121](https://github.com/PaddlePaddle/Paddle/pull/72121)
-### 硬件底层基础库升级
+## 7. 硬件适配
-基础库的升级支持昆仑芯 P800,以及基础组件的支持
-[#65494](https://github.com/PaddlePaddle/Paddle/pull/65494), [#65924](https://github.com/PaddlePaddle/Paddle/pull/65924), [#69752](https://github.com/PaddlePaddle/Paddle/pull/69752), [#70835](https://github.com/PaddlePaddle/Paddle/pull/70835), [#65554](https://github.com/PaddlePaddle/Paddle/pull/65554), [#66998](https://github.com/PaddlePaddle/Paddle/pull/66998), [#65278](https://github.com/PaddlePaddle/Paddle/pull/65278), [#70614](https://github.com/PaddlePaddle/Paddle/pull/70614), [#71012](https://github.com/PaddlePaddle/Paddle/pull/71012), [#71178](https://github.com/PaddlePaddle/Paddle/pull/71178), [#71168](https://github.com/PaddlePaddle/Paddle/pull/71168), [#68740](https://github.com/PaddlePaddle/Paddle/pull/68740), [#71100](https://github.com/PaddlePaddle/Paddle/pull/71100), [#65221](https://github.com/PaddlePaddle/Paddle/pull/65221), [#67983](https://github.com/PaddlePaddle/Paddle/pull/67983)
+优化硬件机制,提供类 cuda 硬件 kernel 复用方案。
-### 其他
+### 新特性
-op test 等相关模块修改
-[#65654](https://github.com/PaddlePaddle/Paddle/pull/65654), [#66233](https://github.com/PaddlePaddle/Paddle/pull/66233), [#66728](https://github.com/PaddlePaddle/Paddle/pull/66728), [#67959](https://github.com/PaddlePaddle/Paddle/pull/67959), [#68169](https://github.com/PaddlePaddle/Paddle/pull/68169), [#68418](https://github.com/PaddlePaddle/Paddle/pull/68418), [#68434](https://github.com/PaddlePaddle/Paddle/pull/68434), [#68445](https://github.com/PaddlePaddle/Paddle/pull/68445), [#68877](https://github.com/PaddlePaddle/Paddle/pull/68877), [#68993](https://github.com/PaddlePaddle/Paddle/pull/68993), [#69006](https://github.com/PaddlePaddle/Paddle/pull/69006), [#70471](https://github.com/PaddlePaddle/Paddle/pull/70471), [#70706](https://github.com/PaddlePaddle/Paddle/pull/70706), [#67777](https://github.com/PaddlePaddle/Paddle/pull/67777), [#65698](https://github.com/PaddlePaddle/Paddle/pull/65698), [#68433](https://github.com/PaddlePaddle/Paddle/pull/68433), [#65689](https://github.com/PaddlePaddle/Paddle/pull/65689)
+以 customdevice 接入方案为基础,增加低成本支持类 cuda 后端硬件的支持方案。类 cuda 后端可以以插件式方式接入 paddle,低成本复用 paddle 中多数 nv 生态中的 cuda kernel,且可以与 paddle 框架中的特性 feature 升级解耦,大大降低硬件后端接入与迭代成本,提升用户接入意愿,形成 paddle 与硬件厂商共建生态的良好合作关系。
+[#72604](https://github.com/PaddlePaddle/Paddle/pull/72604)[#72668](https://github.com/PaddlePaddle/Paddle/pull/72668))[#72758](https://github.com/PaddlePaddle/Paddle/pull/72758)[#72865](https://github.com/PaddlePaddle/Paddle/pull/72865)[#72910](https://github.com/PaddlePaddle/Paddle/pull/72910)[#73033](https://github.com/PaddlePaddle/Paddle/pull/73033))[#73145](https://github.com/PaddlePaddle/Paddle/pull/73145)[#73281](https://github.com/PaddlePaddle/Paddle/pull/73281)[#73079](https://github.com/PaddlePaddle/Paddle/pull/73079)
-## 9. 环境更新
+补充 XPU 基础能力:XPU 环境下增加 kernel ,扩展数据类型,补充分支
+[#71424](https://github.com/PaddlePaddle/Paddle/pull/71424)[#71809](https://github.com/PaddlePaddle/Paddle/pull/71809)[#71594](https://github.com/PaddlePaddle/Paddle/pull/71594)[#71779](https://github.com/PaddlePaddle/Paddle/pull/71779)[#71756](https://github.com/PaddlePaddle/Paddle/pull/71756)[#71573](https://github.com/PaddlePaddle/Paddle/pull/71573)[#71883](https://github.com/PaddlePaddle/Paddle/pull/71883)[#71954](https://github.com/PaddlePaddle/Paddle/pull/71954)[#71931](https://github.com/PaddlePaddle/Paddle/pull/71931)[#72280](https://github.com/PaddlePaddle/Paddle/pull/72280)[#72361](https://github.com/PaddlePaddle/Paddle/pull/72361)[#72406](https://github.com/PaddlePaddle/Paddle/pull/72406)[#72528](https://github.com/PaddlePaddle/Paddle/pull/72528)[#72752](https://github.com/PaddlePaddle/Paddle/pull/72752)[#72852](https://github.com/PaddlePaddle/Paddle/pull/72852)[#72982](https://github.com/PaddlePaddle/Paddle/pull/72982)[#73357](https://github.com/PaddlePaddle/Paddle/pull/73357)[#73414](https://github.com/PaddlePaddle/Paddle/pull/73414)[#73464](https://github.com/PaddlePaddle/Paddle/pull/73464)[#73234](https://github.com/PaddlePaddle/Paddle/pull/73234)[#71776](https://github.com/PaddlePaddle/Paddle/pull/71776)
-- 优化了框架的稳定性和跨平台兼容性,修复了测试覆盖率及编译环境兼容性问题,并增强对 Windows/XPU/DCU 等多平台支持;同时精简了代码结构,移除废弃代码和无用依赖库以降低维护成本;升级 CUDA 等关键依赖,进一步优化 CI/CD 流程,提升构建速度并增强系统整体稳定性。
+DCU kernel 扩展数据类型
+[#73129](https://github.com/PaddlePaddle/Paddle/pull/73129)
### Bug 修复
-- 完善 CI/CD 流程并修复测试用例、解决不同环境下的编译安装问题, 提升框架稳定性和跨环境兼容性。
- [#65627](https://github.com/PaddlePaddle/Paddle/pull/65627), [#65736](https://github.com/PaddlePaddle/Paddle/pull/65736), [#65900](https://github.com/PaddlePaddle/Paddle/pull/65900), [#66069](https://github.com/PaddlePaddle/Paddle/pull/66069), [#67000](https://github.com/PaddlePaddle/Paddle/pull/67000), [#67312](https://github.com/PaddlePaddle/Paddle/pull/67312), [#67432](https://github.com/PaddlePaddle/Paddle/pull/67432), [#67540](https://github.com/PaddlePaddle/Paddle/pull/67540), [#67670](https://github.com/PaddlePaddle/Paddle/pull/67670), [#68449](https://github.com/PaddlePaddle/Paddle/pull/68449), [#70806](https://github.com/PaddlePaddle/Paddle/pull/70806), [#65665](https://github.com/PaddlePaddle/Paddle/pull/65665), [#65652](https://github.com/PaddlePaddle/Paddle/pull/65652), [#70644](https://github.com/PaddlePaddle/Paddle/pull/70644), [#68119](https://github.com/PaddlePaddle/Paddle/pull/68119), [#68466](https://github.com/PaddlePaddle/Paddle/pull/68466), [#68858](https://github.com/PaddlePaddle/Paddle/pull/68858), [#68788](https://github.com/PaddlePaddle/Paddle/pull/68788), [#68934](https://github.com/PaddlePaddle/Paddle/pull/68934), [#69883](https://github.com/PaddlePaddle/Paddle/pull/69883), [#69924](https://github.com/PaddlePaddle/Paddle/pull/69924), [#71187](https://github.com/PaddlePaddle/Paddle/pull/71187), [#70798](https://github.com/PaddlePaddle/Paddle/pull/70798), [#71248](https://github.com/PaddlePaddle/Paddle/pull/71248), [#70512](https://github.com/PaddlePaddle/Paddle/pull/70512), [#71363](https://github.com/PaddlePaddle/Paddle/pull/71363), [#71438](https://github.com/PaddlePaddle/Paddle/pull/71438), [#71291](https://github.com/PaddlePaddle/Paddle/pull/71291)
-
-### 改进升级
+修复 xpu 执行问题
+[#71852](https://github.com/PaddlePaddle/Paddle/pull/71852)[#71966](https://github.com/PaddlePaddle/Paddle/pull/71966)[#72005](https://github.com/PaddlePaddle/Paddle/pull/72005)[#71908](https://github.com/PaddlePaddle/Paddle/pull/71908)[#72431](https://github.com/PaddlePaddle/Paddle/pull/72431)[#72519](https://github.com/PaddlePaddle/Paddle/pull/72519)[#72734](https://github.com/PaddlePaddle/Paddle/pull/72734)[#72763](https://github.com/PaddlePaddle/Paddle/pull/72763)[#72762](https://github.com/PaddlePaddle/Paddle/pull/72762)[#72890](https://github.com/PaddlePaddle/Paddle/pull/72890)[#72867](https://github.com/PaddlePaddle/Paddle/pull/72867)[#73071](https://github.com/PaddlePaddle/Paddle/pull/73071)[#73004](https://github.com/PaddlePaddle/Paddle/pull/73004)[#72726](https://github.com/PaddlePaddle/Paddle/pull/72726)[#73113](https://github.com/PaddlePaddle/Paddle/pull/73113)[#73127](https://github.com/PaddlePaddle/Paddle/pull/73127)[#73025](https://github.com/PaddlePaddle/Paddle/pull/73025)[#73301](https://github.com/PaddlePaddle/Paddle/pull/73301)[#73292](https://github.com/PaddlePaddle/Paddle/pull/73292)[#73272](https://github.com/PaddlePaddle/Paddle/pull/73272)[#73305](https://github.com/PaddlePaddle/Paddle/pull/73305)[#73356](https://github.com/PaddlePaddle/Paddle/pull/73356)[#73438](https://github.com/PaddlePaddle/Paddle/pull/73438)[#72041](https://github.com/PaddlePaddle/Paddle/pull/72041)[#72275](https://github.com/PaddlePaddle/Paddle/pull/72275)[#72787](https://github.com/PaddlePaddle/Paddle/pull/72787)[#73504](https://github.com/PaddlePaddle/Paddle/pull/73504)[#73290](https://github.com/PaddlePaddle/Paddle/pull/73290)
-- 环境升级
- [#69491](https://github.com/PaddlePaddle/Paddle/pull/69491), [#66560](https://github.com/PaddlePaddle/Paddle/pull/66560), [#65686](https://github.com/PaddlePaddle/Paddle/pull/65686), [#71177](https://github.com/PaddlePaddle/Paddle/pull/71177), [#71284](https://github.com/PaddlePaddle/Paddle/pull/71284), [#69791](https://github.com/PaddlePaddle/Paddle/pull/69791), [#69349](https://github.com/PaddlePaddle/Paddle/pull/69349), [#70944](https://github.com/PaddlePaddle/Paddle/pull/70944), [#65411](https://github.com/PaddlePaddle/Paddle/pull/65411)
-- 流水线合并
- [#66815](https://github.com/PaddlePaddle/Paddle/pull/66815), [#67306](https://github.com/PaddlePaddle/Paddle/pull/67306)
-- DCU/NPU/KUNLUN 流水线完善
- [#67516](https://github.com/PaddlePaddle/Paddle/pull/67516), [#67629](https://github.com/PaddlePaddle/Paddle/pull/67629), [#67987](https://github.com/PaddlePaddle/Paddle/pull/67987), [#69903](https://github.com/PaddlePaddle/Paddle/pull/69903), [#68448](https://github.com/PaddlePaddle/Paddle/pull/68448), [#70401](https://github.com/PaddlePaddle/Paddle/pull/70401), [#71192](https://github.com/PaddlePaddle/Paddle/pull/71192), [#71197](https://github.com/PaddlePaddle/Paddle/pull/71197), [#68027](https://github.com/PaddlePaddle/Paddle/pull/68027)
-- Windows 环境支持
- [#70390](https://github.com/PaddlePaddle/Paddle/pull/70390), [#70785](https://github.com/PaddlePaddle/Paddle/pull/70785), [#71286](https://github.com/PaddlePaddle/Paddle/pull/71286), [#71414](https://github.com/PaddlePaddle/Paddle/pull/71414), [#68901](https://github.com/PaddlePaddle/Paddle/pull/68901)
-- 第三方库完善
- [#71419](https://github.com/PaddlePaddle/Paddle/pull/71419)
-- 其他优化用于提升 CI 稳定性和执行效率
- [#67574](https://github.com/PaddlePaddle/Paddle/pull/67574), [#69058](https://github.com/PaddlePaddle/Paddle/pull/69058), [#70610](https://github.com/PaddlePaddle/Paddle/pull/70610), [#67093](https://github.com/PaddlePaddle/Paddle/pull/67093), [#69037](https://github.com/PaddlePaddle/Paddle/pull/69037), [#65213](https://github.com/PaddlePaddle/Paddle/pull/65213), [#65913](https://github.com/PaddlePaddle/Paddle/pull/65913), [#65947](https://github.com/PaddlePaddle/Paddle/pull/65947), [#66479](https://github.com/PaddlePaddle/Paddle/pull/66479), [#71054](https://github.com/PaddlePaddle/Paddle/pull/71054), [#71396](https://github.com/PaddlePaddle/Paddle/pull/71396)
+## 8. 安装环境适配
-### 新特性
-
-- 新增 Github Action 机制
- [#70571](https://github.com/PaddlePaddle/Paddle/pull/70571), [#70626](https://github.com/PaddlePaddle/Paddle/pull/70626), [#71325](https://github.com/PaddlePaddle/Paddle/pull/71325), [#71344](https://github.com/PaddlePaddle/Paddle/pull/71344), [#71353](https://github.com/PaddlePaddle/Paddle/pull/71353), [#71322](https://github.com/PaddlePaddle/Paddle/pull/71322), [#70415](https://github.com/PaddlePaddle/Paddle/pull/70415), [#70465](https://github.com/PaddlePaddle/Paddle/pull/70465), [#70524](https://github.com/PaddlePaddle/Paddle/pull/70524), [#70550](https://github.com/PaddlePaddle/Paddle/pull/70550), [#70564](https://github.com/PaddlePaddle/Paddle/pull/70564), [#70579](https://github.com/PaddlePaddle/Paddle/pull/70579), [#70580](https://github.com/PaddlePaddle/Paddle/pull/70580), [#70963](https://github.com/PaddlePaddle/Paddle/pull/70963), [#71200](https://github.com/PaddlePaddle/Paddle/pull/71200), [#71261](https://github.com/PaddlePaddle/Paddle/pull/71261), [#71265](https://github.com/PaddlePaddle/Paddle/pull/71265)
+优化了框架的稳定性和跨平台兼容性,修复了不同平台上的编译安装失败问题;升级 CUDA 等关键依赖,进一步优化 CI/CD 流程,提升构建速度并增强系统整体稳定性;停止对 Python3.8 环境下的编译安装维护。
-### 废弃
+### Bug 修复
-- 废弃代码与依赖的清理,包括移除不再依赖的 Python 库以及简化编译配置, 降低维护成本
- [#65635](https://github.com/PaddlePaddle/Paddle/pull/65635), [#67542](https://github.com/PaddlePaddle/Paddle/pull/67542), [#67609](https://github.com/PaddlePaddle/Paddle/pull/67604), [#69572](https://github.com/PaddlePaddle/Paddle/pull/69572), [#68150](https://github.com/PaddlePaddle/Paddle/pull/68150), [#67604](https://github.com/PaddlePaddle/Paddle/pull/67604), [#68561](https://github.com/PaddlePaddle/Paddle/pull/68561), [#68904](https://github.com/PaddlePaddle/Paddle/pull/68904), [#67219](https://github.com/PaddlePaddle/Paddle/pull/67219)
+- 修复使用 clang17 编译第三方库时的编译错误。[#72524](https://github.com/PaddlePaddle/Paddle/pull/72524)
+- 修复使用 CUDA12.9 时的编译问题。 [#72808](https://github.com/PaddlePaddle/Paddle/pull/72808), [#72841](https://github.com/PaddlePaddle/Paddle/pull/72841), [#72978](https://github.com/PaddlePaddle/Paddle/pull/72978), [#73360](https://github.com/PaddlePaddle/Paddle/pull/73360)
+- 修复使用 GCC13.3 时的编译问题。[#73144](https://github.com/PaddlePaddle/Paddle/pull/73144)
+- 修复 WITH_PIP_CUDA_LIBRARIES=ON 时的编译问题。[#72907](https://github.com/PaddlePaddle/Paddle/pull/72907)
+- 修复 WITH_NVSHMEM=ON 时的编译问题。[#73368](https://github.com/PaddlePaddle/Paddle/pull/73368)
-## 10. 其他
+### 功能增强
-- 与用户使用无关的改动,包括废弃代码清理、代码迁移、单测清理、调试或者监控机制升级等。
+- 避免自定义算子编译产生的临时文件的拷贝。[#73196](https://github.com/PaddlePaddle/Paddle/pull/73196)
+- Warning 信息优化。[#72877](https://github.com/PaddlePaddle/Paddle/pull/72877)
-### 开发者相关内容
+### 开发者相关
-- 删除无用调试代码,代码迁移
- [#65256](https://github.com/PaddlePaddle/Paddle/pull/65256), [#65782](https://github.com/PaddlePaddle/Paddle/pull/65782), [#65836](https://github.com/PaddlePaddle/Paddle/pull/65836), [#65840](https://github.com/PaddlePaddle/Paddle/pull/65840), [#65862](https://github.com/PaddlePaddle/Paddle/pull/65862), [#65863](https://github.com/PaddlePaddle/Paddle/pull/65863), [#65987](https://github.com/PaddlePaddle/Paddle/pull/65987), [#66547](https://github.com/PaddlePaddle/Paddle/pull/66547), [#66556](https://github.com/PaddlePaddle/Paddle/pull/66556), [#66645](https://github.com/PaddlePaddle/Paddle/pull/66645), [#66646](https://github.com/PaddlePaddle/Paddle/pull/66646), [#66648](https://github.com/PaddlePaddle/Paddle/pull/66648), [#66672](https://github.com/PaddlePaddle/Paddle/pull/66672), [#66783](https://github.com/PaddlePaddle/Paddle/pull/66783), [#66083](https://github.com/PaddlePaddle/Paddle/pull/66083), [#65562](https://github.com/PaddlePaddle/Paddle/pull/65562), [#66564](https://github.com/PaddlePaddle/Paddle/pull/66564), [#66370](https://github.com/PaddlePaddle/Paddle/pull/66370), [#66912](https://github.com/PaddlePaddle/Paddle/pull/66912), [#66913](https://github.com/PaddlePaddle/Paddle/pull/66913), [#66914](https://github.com/PaddlePaddle/Paddle/pull/66914), [#66915](https://github.com/PaddlePaddle/Paddle/pull/66915), [#66664](https://github.com/PaddlePaddle/Paddle/pull/66664), [#66671](https://github.com/PaddlePaddle/Paddle/pull/66671), [#66121](https://github.com/PaddlePaddle/Paddle/pull/66121), [#65907](https://github.com/PaddlePaddle/Paddle/pull/65907), [#65949](https://github.com/PaddlePaddle/Paddle/pull/65949), [#65950](https://github.com/PaddlePaddle/Paddle/pull/65950), [#65954](https://github.com/PaddlePaddle/Paddle/pull/65954), [#66545](https://github.com/PaddlePaddle/Paddle/pull/66545), [#66649](https://github.com/PaddlePaddle/Paddle/pull/66649), [#66900](https://github.com/PaddlePaddle/Paddle/pull/66900), [#66901](https://github.com/PaddlePaddle/Paddle/pull/66901), [#66902](https://github.com/PaddlePaddle/Paddle/pull/66902), [#66903](https://github.com/PaddlePaddle/Paddle/pull/66903), [#66904](https://github.com/PaddlePaddle/Paddle/pull/66904), [#66906](https://github.com/PaddlePaddle/Paddle/pull/66906), [#66907](https://github.com/PaddlePaddle/Paddle/pull/66907), [#66908](https://github.com/PaddlePaddle/Paddle/pull/66908), [#66909](https://github.com/PaddlePaddle/Paddle/pull/66909), [#66549](https://github.com/PaddlePaddle/Paddle/pull/66549), [#66555](https://github.com/PaddlePaddle/Paddle/pull/66555), [#66647](https://github.com/PaddlePaddle/Paddle/pull/66647), [#66898](https://github.com/PaddlePaddle/Paddle/pull/66898), [#66886](https://github.com/PaddlePaddle/Paddle/pull/66886), [#66042](https://github.com/PaddlePaddle/Paddle/pull/66042), [#66043](https://github.com/PaddlePaddle/Paddle/pull/66043), [#66045](https://github.com/PaddlePaddle/Paddle/pull/66045), [#66046](https://github.com/PaddlePaddle/Paddle/pull/66046), [#65826](https://github.com/PaddlePaddle/Paddle/pull/65826), [#65825](https://github.com/PaddlePaddle/Paddle/pull/65825), [#65827](https://github.com/PaddlePaddle/Paddle/pull/65827), [#65829](https://github.com/PaddlePaddle/Paddle/pull/65829), [#65830](https://github.com/PaddlePaddle/Paddle/pull/65830), [#65831](https://github.com/PaddlePaddle/Paddle/pull/65831), [#66081](https://github.com/PaddlePaddle/Paddle/pull/66081), [#66082](https://github.com/PaddlePaddle/Paddle/pull/66082), [#66087](https://github.com/PaddlePaddle/Paddle/pull/66087), [#65980](https://github.com/PaddlePaddle/Paddle/pull/65980), [#65981](https://github.com/PaddlePaddle/Paddle/pull/65981), [#65983](https://github.com/PaddlePaddle/Paddle/pull/65983), [#65985](https://github.com/PaddlePaddle/Paddle/pull/65985), [#65979](https://github.com/PaddlePaddle/Paddle/pull/65979), [#65986](https://github.com/PaddlePaddle/Paddle/pull/65986), [#65988](https://github.com/PaddlePaddle/Paddle/pull/65988), [#65989](https://github.com/PaddlePaddle/Paddle/pull/65989), [#66682](https://github.com/PaddlePaddle/Paddle/pull/66682), [#66717](https://github.com/PaddlePaddle/Paddle/pull/66717), [#65802](https://github.com/PaddlePaddle/Paddle/pull/65802), [#66159](https://github.com/PaddlePaddle/Paddle/pull/66159), [#66147](https://github.com/PaddlePaddle/Paddle/pull/66147), [#66149](https://github.com/PaddlePaddle/Paddle/pull/66149), [#66150](https://github.com/PaddlePaddle/Paddle/pull/66150), [#65798](https://github.com/PaddlePaddle/Paddle/pull/65798), [#65731](https://github.com/PaddlePaddle/Paddle/pull/65731), [#66145](https://github.com/PaddlePaddle/Paddle/pull/66145), [#66086](https://github.com/PaddlePaddle/Paddle/pull/66086), [#65781](https://github.com/PaddlePaddle/Paddle/pull/65781), [#65837](https://github.com/PaddlePaddle/Paddle/pull/65837), [#65828](https://github.com/PaddlePaddle/Paddle/pull/65828), [#65864](https://github.com/PaddlePaddle/Paddle/pull/65864), [#65959](https://github.com/PaddlePaddle/Paddle/pull/65959), [#65706](https://github.com/PaddlePaddle/Paddle/pull/65706), [#66918](https://github.com/PaddlePaddle/Paddle/pull/66918), [#66191](https://github.com/PaddlePaddle/Paddle/pull/66191), [#66689](https://github.com/PaddlePaddle/Paddle/pull/66689), [#66808](https://github.com/PaddlePaddle/Paddle/pull/66808), [#65424](https://github.com/PaddlePaddle/Paddle/pull/65424), [#65452](https://github.com/PaddlePaddle/Paddle/pull/65452), [#65463](https://github.com/PaddlePaddle/Paddle/pull/65463), [#65478](https://github.com/PaddlePaddle/Paddle/pull/65478), [#65339](https://github.com/PaddlePaddle/Paddle/pull/65339)
-- 规范化代码命名空间
- [#64755](https://github.com/PaddlePaddle/Paddle/pull/64755), [#64765](https://github.com/PaddlePaddle/Paddle/pull/64765), [#64767](https://github.com/PaddlePaddle/Paddle/pull/64767), [#64770](https://github.com/PaddlePaddle/Paddle/pull/64770), [#64775](https://github.com/PaddlePaddle/Paddle/pull/64775), [#64776](https://github.com/PaddlePaddle/Paddle/pull/64776), [#64757](https://github.com/PaddlePaddle/Paddle/pull/64757), [#64780](https://github.com/PaddlePaddle/Paddle/pull/64780), [#64777](https://github.com/PaddlePaddle/Paddle/pull/64777), [#64779](https://github.com/PaddlePaddle/Paddle/pull/64779), [#64758](https://github.com/PaddlePaddle/Paddle/pull/64758), [#64759](https://github.com/PaddlePaddle/Paddle/pull/64759), [#64762](https://github.com/PaddlePaddle/Paddle/pull/64762)
-- 修改算子列表
- [#66573](https://github.com/PaddlePaddle/Paddle/pull/66573), [#65598](https://github.com/PaddlePaddle/Paddle/pull/65598), [#65100](https://github.com/PaddlePaddle/Paddle/pull/65100), [#65385](https://github.com/PaddlePaddle/Paddle/pull/65385), [#65192](https://github.com/PaddlePaddle/Paddle/pull/65192), [#65118](https://github.com/PaddlePaddle/Paddle/pull/65118), [#65108](https://github.com/PaddlePaddle/Paddle/pull/65108), [#65153](https://github.com/PaddlePaddle/Paddle/pull/65153), [#65465](https://github.com/PaddlePaddle/Paddle/pull/65465), [#65128](https://github.com/PaddlePaddle/Paddle/pull/65128), [#65420](https://github.com/PaddlePaddle/Paddle/pull/65420), [#65099](https://github.com/PaddlePaddle/Paddle/pull/65099), [#65207](https://github.com/PaddlePaddle/Paddle/pull/65207), [#66066](https://github.com/PaddlePaddle/Paddle/pull/66066), [#65400](https://github.com/PaddlePaddle/Paddle/pull/65400), [#65160](https://github.com/PaddlePaddle/Paddle/pull/65160), [#65195](https://github.com/PaddlePaddle/Paddle/pull/65195), [#65445](https://github.com/PaddlePaddle/Paddle/pull/65445), [#65479](https://github.com/PaddlePaddle/Paddle/pull/65479), [#65193](https://github.com/PaddlePaddle/Paddle/pull/65193), [#65401](https://github.com/PaddlePaddle/Paddle/pull/65401), [#66724](https://github.com/PaddlePaddle/Paddle/pull/66724), [#65164](https://github.com/PaddlePaddle/Paddle/pull/65164), [#65466](https://github.com/PaddlePaddle/Paddle/pull/65466), [#65661](https://github.com/PaddlePaddle/Paddle/pull/65661), [#65897](https://github.com/PaddlePaddle/Paddle/pull/65897), [#66022](https://github.com/PaddlePaddle/Paddle/pull/66022), [#65313](https://github.com/PaddlePaddle/Paddle/pull/65313), [#65616](https://github.com/PaddlePaddle/Paddle/pull/65616), [#65588](https://github.com/PaddlePaddle/Paddle/pull/65588), [#65174](https://github.com/PaddlePaddle/Paddle/pull/65174), [#65402](https://github.com/PaddlePaddle/Paddle/pull/65402), [#65154](https://github.com/PaddlePaddle/Paddle/pull/65154), [#65151](https://github.com/PaddlePaddle/Paddle/pull/65151), [#65098](https://github.com/PaddlePaddle/Paddle/pull/65098), [#64953](https://github.com/PaddlePaddle/Paddle/pull/64953), [#65122](https://github.com/PaddlePaddle/Paddle/pull/65122), [#65590](https://github.com/PaddlePaddle/Paddle/pull/65590), [#65152](https://github.com/PaddlePaddle/Paddle/pull/65152)
-- Paddle 框架旧执行器功能退场
- [#65077](https://github.com/PaddlePaddle/Paddle/pull/65077), [#65340](https://github.com/PaddlePaddle/Paddle/pull/65340)
-- 报错信息提示优化
- [#66668](https://github.com/PaddlePaddle/Paddle/pull/66668), [#66675](https://github.com/PaddlePaddle/Paddle/pull/66675), [#66605](https://github.com/PaddlePaddle/Paddle/pull/66605), [#66613](https://github.com/PaddlePaddle/Paddle/pull/66613), [#66507](https://github.com/PaddlePaddle/Paddle/pull/66507), [#66700](https://github.com/PaddlePaddle/Paddle/pull/66700), [#66739](https://github.com/PaddlePaddle/Paddle/pull/66739), [#66719](https://github.com/PaddlePaddle/Paddle/pull/66719), [#66733](https://github.com/PaddlePaddle/Paddle/pull/66733), [#66552](https://github.com/PaddlePaddle/Paddle/pull/66552), [#66548](https://github.com/PaddlePaddle/Paddle/pull/66548), [#66623](https://github.com/PaddlePaddle/Paddle/pull/66623), [#66702](https://github.com/PaddlePaddle/Paddle/pull/66702), [#66705](https://github.com/PaddlePaddle/Paddle/pull/66705), [#66718](https://github.com/PaddlePaddle/Paddle/pull/66718), [#66727](https://github.com/PaddlePaddle/Paddle/pull/66727), [#66860](https://github.com/PaddlePaddle/Paddle/pull/66860), [#66869](https://github.com/PaddlePaddle/Paddle/pull/66869), [#66933](https://github.com/PaddlePaddle/Paddle/pull/66933), [#66939](https://github.com/PaddlePaddle/Paddle/pull/66939), [#66553](https://github.com/PaddlePaddle/Paddle/pull/66553), [#66774](https://github.com/PaddlePaddle/Paddle/pull/66774), [#66794](https://github.com/PaddlePaddle/Paddle/pull/66794), [#66551](https://github.com/PaddlePaddle/Paddle/pull/66551), [#66540](https://github.com/PaddlePaddle/Paddle/pull/66540), [#66617](https://github.com/PaddlePaddle/Paddle/pull/66617), [#66841](https://github.com/PaddlePaddle/Paddle/pull/66841), [#66788](https://github.com/PaddlePaddle/Paddle/pull/66788), [#66954](https://github.com/PaddlePaddle/Paddle/pull/66954), [#66698](https://github.com/PaddlePaddle/Paddle/pull/66698), [#66782](https://github.com/PaddlePaddle/Paddle/pull/66782), [#66844](https://github.com/PaddlePaddle/Paddle/pull/66844), [#66443](https://github.com/PaddlePaddle/Paddle/pull/66443), [#66455](https://github.com/PaddlePaddle/Paddle/pull/66455), [#66517](https://github.com/PaddlePaddle/Paddle/pull/66517), [#66804](https://github.com/PaddlePaddle/Paddle/pull/66804), [#66802](https://github.com/PaddlePaddle/Paddle/pull/66802), [#66536](https://github.com/PaddlePaddle/Paddle/pull/66536), [#66707](https://github.com/PaddlePaddle/Paddle/pull/66707), [#66525](https://github.com/PaddlePaddle/Paddle/pull/66525), [#66753](https://github.com/PaddlePaddle/Paddle/pull/66753), [#66550](https://github.com/PaddlePaddle/Paddle/pull/66550), [#66857](https://github.com/PaddlePaddle/Paddle/pull/66857), [#66471](https://github.com/PaddlePaddle/Paddle/pull/66471), [#66628](https://github.com/PaddlePaddle/Paddle/pull/66628), [#66469](https://github.com/PaddlePaddle/Paddle/pull/66469), [#66775](https://github.com/PaddlePaddle/Paddle/pull/66775), [#66506](https://github.com/PaddlePaddle/Paddle/pull/66506), [#66780](https://github.com/PaddlePaddle/Paddle/pull/66780), [#66953](https://github.com/PaddlePaddle/Paddle/pull/66953), [#66695](https://github.com/PaddlePaddle/Paddle/pull/66695), [#66603](https://github.com/PaddlePaddle/Paddle/pull/66603), [#66491](https://github.com/PaddlePaddle/Paddle/pull/66491), [#66715](https://github.com/PaddlePaddle/Paddle/pull/66715), [#66632](https://github.com/PaddlePaddle/Paddle/pull/66632), [#66594](https://github.com/PaddlePaddle/Paddle/pull/66594), [#66615](https://github.com/PaddlePaddle/Paddle/pull/66615), [#66578](https://github.com/PaddlePaddle/Paddle/pull/66578), [#66534](https://github.com/PaddlePaddle/Paddle/pull/66534), [#66569](https://github.com/PaddlePaddle/Paddle/pull/66569), [#66529](https://github.com/PaddlePaddle/Paddle/pull/66529), [#66530](https://github.com/PaddlePaddle/Paddle/pull/66530), [#66522](https://github.com/PaddlePaddle/Paddle/pull/66522), [#66789](https://github.com/PaddlePaddle/Paddle/pull/66789), [#66600](https://github.com/PaddlePaddle/Paddle/pull/66600), [#66511](https://github.com/PaddlePaddle/Paddle/pull/66511), [#66512](https://github.com/PaddlePaddle/Paddle/pull/66512), [#66527](https://github.com/PaddlePaddle/Paddle/pull/66527), [#66518](https://github.com/PaddlePaddle/Paddle/pull/66518), [#66958](https://github.com/PaddlePaddle/Paddle/pull/66958), [#66532](https://github.com/PaddlePaddle/Paddle/pull/66532), [#65258](https://github.com/PaddlePaddle/Paddle/pull/65258), [#66487](https://github.com/PaddlePaddle/Paddle/pull/66487), [#66876](https://github.com/PaddlePaddle/Paddle/pull/66876), [#66832](https://github.com/PaddlePaddle/Paddle/pull/66832), [#66872](https://github.com/PaddlePaddle/Paddle/pull/66872), [#66830](https://github.com/PaddlePaddle/Paddle/pull/66830), [#66708](https://github.com/PaddlePaddle/Paddle/pull/66708), [#66502](https://github.com/PaddlePaddle/Paddle/pull/66502), [#66521](https://github.com/PaddlePaddle/Paddle/pull/66521), [#66592](https://github.com/PaddlePaddle/Paddle/pull/66592)
+- 编译安装维护与升级。[#71911](https://github.com/PaddlePaddle/Paddle/pull/71911), [#73005](https://github.com/PaddlePaddle/Paddle/pull/73005)
+- 镜像维护与更新。[#71065](https://github.com/PaddlePaddle/Paddle/pull/71065), [#71821](https://github.com/PaddlePaddle/Paddle/pull/71821)
+- Windows 平台符号的导入导出更新。[#72497](https://github.com/PaddlePaddle/Paddle/pull/72497), [#72498](https://github.com/PaddlePaddle/Paddle/pull/72498), [#72500](https://github.com/PaddlePaddle/Paddle/pull/72500)
+- Windows 平台支持 CUDA12.8。[#72433](https://github.com/PaddlePaddle/Paddle/pull/72433)
+- CI 维护与升级。[#72443](https://github.com/PaddlePaddle/Paddle/pull/72443), [#72836](https://github.com/PaddlePaddle/Paddle/pull/72836), [#72563](https://github.com/PaddlePaddle/Paddle/pull/72563), [#72653](https://github.com/PaddlePaddle/Paddle/pull/72653), [#72477](https://github.com/PaddlePaddle/Paddle/pull/72477), [#72778](https://github.com/PaddlePaddle/Paddle/pull/72778), [#72960](https://github.com/PaddlePaddle/Paddle/pull/72960), [#73289](https://github.com/PaddlePaddle/Paddle/pull/73289), [#73422](https://github.com/PaddlePaddle/Paddle/pull/73422), [#73514](https://github.com/PaddlePaddle/Paddle/pull/73514), [#72748](https://github.com/PaddlePaddle/Paddle/pull/72748),
+- Github Action CI 建设。[#71738](https://github.com/PaddlePaddle/Paddle/pull/71738), [#70602](https://github.com/PaddlePaddle/Paddle/pull/70602), [#71958](https://github.com/PaddlePaddle/Paddle/pull/71958), [#71959](https://github.com/PaddlePaddle/Paddle/pull/71959), [#71992](https://github.com/PaddlePaddle/Paddle/pull/71992), [#72013](https://github.com/PaddlePaddle/Paddle/pull/72013), [#72153](https://github.com/PaddlePaddle/Paddle/pull/72153), [#72031](https://github.com/PaddlePaddle/Paddle/pull/72031), [#72141](https://github.com/PaddlePaddle/Paddle/pull/72141), [#72104](https://github.com/PaddlePaddle/Paddle/pull/72104), [#72182](https://github.com/PaddlePaddle/Paddle/pull/72182), [#72342](https://github.com/PaddlePaddle/Paddle/pull/72342), [#72352](https://github.com/PaddlePaddle/Paddle/pull/72352), [#72249](https://github.com/PaddlePaddle/Paddle/pull/72249), [#72068](https://github.com/PaddlePaddle/Paddle/pull/72068), [#72441](https://github.com/PaddlePaddle/Paddle/pull/72441), [#72392](https://github.com/PaddlePaddle/Paddle/pull/72392), [#72446](https://github.com/PaddlePaddle/Paddle/pull/72446), [#72435](https://github.com/PaddlePaddle/Paddle/pull/72435), [#72515](https://github.com/PaddlePaddle/Paddle/pull/72515), [#72514](https://github.com/PaddlePaddle/Paddle/pull/72514), [#72396](https://github.com/PaddlePaddle/Paddle/pull/72396), [#72547](https://github.com/PaddlePaddle/Paddle/pull/72547), [#72345](https://github.com/PaddlePaddle/Paddle/pull/72345), [#72236](https://github.com/PaddlePaddle/Paddle/pull/72236), [#72586](https://github.com/PaddlePaddle/Paddle/pull/72586), [#72537](https://github.com/PaddlePaddle/Paddle/pull/72537), [#72609](https://github.com/PaddlePaddle/Paddle/pull/72609), [#72632](https://github.com/PaddlePaddle/Paddle/pull/72632), [#72642](https://github.com/PaddlePaddle/Paddle/pull/72642), [#72673](https://github.com/PaddlePaddle/Paddle/pull/72673), [#72647](https://github.com/PaddlePaddle/Paddle/pull/72647), [#72696](https://github.com/PaddlePaddle/Paddle/pull/72696), [#72771](https://github.com/PaddlePaddle/Paddle/pull/72771), [#72711](https://github.com/PaddlePaddle/Paddle/pull/72711), [#72680](https://github.com/PaddlePaddle/Paddle/pull/72680), [#72774](https://github.com/PaddlePaddle/Paddle/pull/72774), [#72813](https://github.com/PaddlePaddle/Paddle/pull/72813), [#72804](https://github.com/PaddlePaddle/Paddle/pull/72804), [#72903](https://github.com/PaddlePaddle/Paddle/pull/72903), [#72900](https://github.com/PaddlePaddle/Paddle/pull/72900), [#72932](https://github.com/PaddlePaddle/Paddle/pull/72932), [#72967](https://github.com/PaddlePaddle/Paddle/pull/72967), [#72991](https://github.com/PaddlePaddle/Paddle/pull/72991), [#72115](https://github.com/PaddlePaddle/Paddle/pull/72115), [#73242](https://github.com/PaddlePaddle/Paddle/pull/73242), [#72801](https://github.com/PaddlePaddle/Paddle/pull/72801), [#73433](https://github.com/PaddlePaddle/Paddle/pull/73433), [#73391](https://github.com/PaddlePaddle/Paddle/pull/73391), [#73456](https://github.com/PaddlePaddle/Paddle/pull/73456), [#73376](https://github.com/PaddlePaddle/Paddle/pull/73376), [#73453](https://github.com/PaddlePaddle/Paddle/pull/73453), [#73481](https://github.com/PaddlePaddle/Paddle/pull/73481), [#73546](https://github.com/PaddlePaddle/Paddle/pull/73546), [#73446](https://github.com/PaddlePaddle/Paddle/pull/73446), [#72744](https://github.com/PaddlePaddle/Paddle/pull/72744)
### 废弃
-- 废弃代码清理、无用单测清理
- [#65894](https://github.com/PaddlePaddle/Paddle/pull/65894), [#66165](https://github.com/PaddlePaddle/Paddle/pull/66165), [#66293](https://github.com/PaddlePaddle/Paddle/pull/66293), [#66102](https://github.com/PaddlePaddle/Paddle/pull/66102), [#66442](https://github.com/PaddlePaddle/Paddle/pull/66442), [#66922](https://github.com/PaddlePaddle/Paddle/pull/66922), [#66531](https://github.com/PaddlePaddle/Paddle/pull/66531), [#65518](https://github.com/PaddlePaddle/Paddle/pull/65518), [#66800](https://github.com/PaddlePaddle/Paddle/pull/66800), [#66372](https://github.com/PaddlePaddle/Paddle/pull/66372), [#65902](https://github.com/PaddlePaddle/Paddle/pull/65902), [#65462](https://github.com/PaddlePaddle/Paddle/pull/65462), [#65327](https://github.com/PaddlePaddle/Paddle/pull/65327), [#65189](https://github.com/PaddlePaddle/Paddle/pull/65189), [#65181](https://github.com/PaddlePaddle/Paddle/pull/65181), [#66535](https://github.com/PaddlePaddle/Paddle/pull/66535), [#65383](https://github.com/PaddlePaddle/Paddle/pull/65383), [#65173](https://github.com/PaddlePaddle/Paddle/pull/65173), [#66429](https://github.com/PaddlePaddle/Paddle/pull/66429), [#66386](https://github.com/PaddlePaddle/Paddle/pull/66386), [#66447](https://github.com/PaddlePaddle/Paddle/pull/66447), [#66367](https://github.com/PaddlePaddle/Paddle/pull/66367), [#66160](https://github.com/PaddlePaddle/Paddle/pull/66160), [#65408](https://github.com/PaddlePaddle/Paddle/pull/65408), [#65433](https://github.com/PaddlePaddle/Paddle/pull/65433), [#65481](https://github.com/PaddlePaddle/Paddle/pull/65481), [#65444](https://github.com/PaddlePaddle/Paddle/pull/65444), [#65389](https://github.com/PaddlePaddle/Paddle/pull/65389), [#65663](https://github.com/PaddlePaddle/Paddle/pull/65663), [#65649](https://github.com/PaddlePaddle/Paddle/pull/65649), [#65629](https://github.com/PaddlePaddle/Paddle/pull/65629), [#66142](https://github.com/PaddlePaddle/Paddle/pull/66142), [#65796](https://github.com/PaddlePaddle/Paddle/pull/65796), [#66163](https://github.com/PaddlePaddle/Paddle/pull/66163), [#66291](https://github.com/PaddlePaddle/Paddle/pull/66291), [#65480](https://github.com/PaddlePaddle/Paddle/pull/65480), [#65495](https://github.com/PaddlePaddle/Paddle/pull/65495), [#65498](https://github.com/PaddlePaddle/Paddle/pull/65498), [#65503](https://github.com/PaddlePaddle/Paddle/pull/65503), [#65502](https://github.com/PaddlePaddle/Paddle/pull/65502), [#65501](https://github.com/PaddlePaddle/Paddle/pull/65501), [#65512](https://github.com/PaddlePaddle/Paddle/pull/65512), [#65528](https://github.com/PaddlePaddle/Paddle/pull/65528), [#65472](https://github.com/PaddlePaddle/Paddle/pull/65472), [#65390](https://github.com/PaddlePaddle/Paddle/pull/65390), [#65344](https://github.com/PaddlePaddle/Paddle/pull/65344), [#65384](https://github.com/PaddlePaddle/Paddle/pull/65384), [#65388](https://github.com/PaddlePaddle/Paddle/pull/65388), [#65198](https://github.com/PaddlePaddle/Paddle/pull/65198), [#65248](https://github.com/PaddlePaddle/Paddle/pull/65248), [#65443](https://github.com/PaddlePaddle/Paddle/pull/65443), [#65430](https://github.com/PaddlePaddle/Paddle/pull/65430)
+- 停止支持 Python3.8 环境下的编译。[#72827](https://github.com/PaddlePaddle/Paddle/pull/72827)
-## 11. 贡献者名单
+## 9. 贡献者名单
-0x3878f, 0x45f, 2742195759, 86kkd, A-nnonymous, ADream-ki, Aganlengzi, Albresky, AndPuQing, AndSonder, Aoraki-Dream, ApricityXX, Asthestarsfalll, Aurelius84, BHmingyang, BeingGod, Betelgeu, BiynXu, CJ77Qi, Caogration, DDDivano, Dale1314, Deleter-D, DesmonDay, Difers, Dmovic, DongBaiYue, DrRyanHuang, DrownFish19, Eddie-Wang1120, EgoistSA, FeixLiu, ForFishes, Fripping, From00, Function-Samuel, GoldenStain, Guanhuachen2003, GuoxiaWang, Hanyonggong, HarperCy, Hongqing-work, HydrogenSulfate, JZ-LIANG, Jeff114514, JiaWenxuan, LLee233, LanCole, Lans1ot, Layssy, Leoforever123, LiYuRio, LielinJiang, LittleHeroZZZX, Liujie0926, Liyulingyue, Luohongzhige, Marcusryz, MarisaSparkL, Micalling, MikhayEeer, MrXnneHang, MufanColin, NKNaN, Neo-WY, NeroLoh, PolaKuma, Qin-sx, QingshuChen, RachelXu7, RichardWooSJTU, RuohengMa, SCUcookie, Sekiro-x, SigureMo, Sunny-bot1, SylarTiaNII, Sylence8, TBD1, TR666, TimeYWL, Tom-Zheng, Turingg, Victor-Bayim, Vvsmile, WAYKEN-TSE, Wanglongzhi2001, Wangzheee, Waynezee, Wennie396, Whsjrczr, Wizard-ZP, Wong4j, XavierZXY, XiaociZhang, XieYunshen, Xing-lil, Xreki, YKTian-x2b, YZW-explorer, YanhuiDua, YuanRisheng, ZHOU05030, ZhangHandi, ZhangX-21, ZibinGuo, a2064968462, anderson101866, aooxin, aquagull, baoqiwen, bapijun, blacksheep-Aristotle, bukejiyu, carryyu, ccsuzzh, chang-wenbin, changeyoung98, chen2016013, ckl117, cmcamdy, co63oc, continue-coding, cqulilujia, crazyxiaoxi, cszdrg, cubehan3, cyber-pioneer, danleifeng, decade-afk, deepllz, dynamicheart, eee4017, eggman-1024, enkilee, epiphanyer, ethan-sem, fangfangssj, feixi21, fightfat, fufu0615, fxfxfxfxfxfxfxfx, fxy1699, gitliuyf, gongel, gongshaotian, gongweibao, gouzil, gsq7474741, guixxiic, gzy19990617, hanyang2508, haoyu2022, heavyrain-lzy, houj04, huangjiyi, huangkr03, hxzd5568, icpcccpc, inaomIIsfarell, iosmers, jeff41404, jerrywgz, jiachengdai, jiahy0825, jinmingyi1998, jinyouzhi, joseflv, jychen21, jzhang533, kangguangli, kanze1, kineast, kircle888, l1cacheDell, leo0519, lifulll, linkk08, little1d, liufengwei0103, liuruyan, lixcli, liym27, liyongchao911, lizexu123, lizhenyun01, lj970926, lshpku, lszxb, ltd0924, luotao1, lwkhahaha, lxd-cumt, mayang002, megemini, mikemikimike, ming1753, monster1015, mori0umi, ndyysheep, nizne9, nobodynobody, ooooo-create, penPenf28, phlrain, pkuzyc, qili93, rich04lin, risemeup1, ronny1996, rsmallblue, runzhech, skywalker2012, smile2game, sneaxiy, successfulbarrier, sunzhongkai588, swgu98, tc20042008, tianhaodongbd, tianshuo78520a, tizhou86, tlxd, uanu2002, umiswing, vivienfanghuagood, waliwali777, walkalone20, wanghuancoder, wangna11BD, will-jl944, winffke, winter-wang, wwwuyan, xiaoguoguo626807, xiaoluomi, xiaoyao0115, xingmingyyj, xkkkkkk23, xu8117, xuxinyi389, xz-alex, yangrongxinuser, yeteye, yinfan98, yongqiangma, yuan20041218, yuanlehome, yuguo-Jack, yumin066, zbt78, zeroRains, zhangbo9674, zhanghonggeng, zhanglirong1999, zhangting2020, zhangyk0314, zhangyuqin1998, zhiminzhang0830, zhink, zhiqiu, zhouquan32, zhoutianzi666, zhwesky2010, zoooo0820, zrr1999, zty-king, zxcd, zyfncg
+0x3878f, A-nnonymous, AndSonder, ApricityXX, aquagull, author, baoqiwen, BeingGod, blacksheep-Aristotle, BoShen5, bukejiyu, cangtianhuang, carryyu, chang-wenbin, changeyoung98, chen2016013, ckl117, co63oc, cqulilujia, crashbussy, cszdrg, Cutelemon6, cyy536, DanielSun11, danleifeng, datutu-L, deepllz, Dmovic, DrRyanHuang, dynamicheart, Eddie-Wang1120, eggman-1024, emmanuel-ferdman, Enigmatisms, enkilee, fangfangssj, feixi21, FeixLiu, ForFishes, Function-Samuel, ggggxm, GITD245, Glencsa, GoldenStain, gongshaotian, gouzil, gzy19990617, hanlintang, Hongqing-work, houj04, huangjiyi, hxzd5568, HydrogenSulfate, jzhang533, LCStayingdullCircuit, leon062112, lifulll, linkk08, LittleHeroZZZX, liufengwei0103, Liujie0926, liuruyan, lixinqi, LiYuRio, lizexu123, lizhenyun01, lj970926, lshpku, megemini, mikethegoblin, ming1753, mzj104, NKNaN, ooooo-create, pesionzhao, phlrain, pkuzyc, PolaKuma, Qin-sx, RichardWooSJTU, risemeup1, runzhech, RuohengMa, sasaya123, shanjiang7, SigureMo, sneaxiy, swgu98, SylarTiaNII, tianhaodongbd, tianshuo78520a, timminator, tizhou86, umiswing, waliwali777, wanghuancoder, Waynezee, Wennie396, xiaoguoguo626807, XieYunshen, Xing-lil, xkkkkkk23, Xreki, xuxinyi389, Yeenyeong, yongqiangma, YqGe585, yuanlehome, YuanRisheng, yulangz, yuwu46, zeroRains, zhangbo9674, zhanghonggeng, zhangting2020, ZhangX-21, zhangyk0314, zhangyuqin1998, zhink, zhiqiu, zhouquan32, zhoutianzi666, zhupengyang, zrr1999, zty-king, zyfncg
diff --git a/docs/release_note_en.md b/docs/release_note_en.md
index a698d6d81a3..e5b53007587 100644
--- a/docs/release_note_en.md
+++ b/docs/release_note_en.md
@@ -1,543 +1,292 @@
-# 3.0 Release Note
+# 3.1 Release Note
-Declaration: This document is translated by [Baidu Translate](https://fanyi.baidu.com/)
+PaddlePaddle framework version 3.1 has undergone further optimization and polishing for its core function of automatic parallelism, enhancing usability and performance. It also provides support for FP8 low-precision training, improving the training speed of large language models by 10-20%. The hardware expansion mechanism has been improved, reducing the cost of adapting to hardware similar to CUDA. Users only need to register the kernel. At the same time, the basic capabilities of the framework have been enhanced to improve its stability. The key updated features are as follows:
-As China's first independently developed industrial-grade deep learning platform, PaddlePaddle has always adhered to the open-source path, supporting the intelligent upgrade of industries. The PaddlePaddle framework version 3.0 not only continues the characteristics of the PaddlePaddle framework 2.0 series, which unifies static and dynamic operations and integrates training and inference, but also achieves breakthroughs in automatic parallelism, neural network compilers, and high-order automatic differentiation, providing strong support for technological innovation and industrial applications in the era of large models, and creating a one-stop, high-performance deep learning development experience for developers. Whether it is cutting-edge algorithm research or the implementation of industrial-grade large models, PaddlePaddle framework version 3.0 will become the preferred tool for developers. Key features are described as follows:
+- **Automatic Parallel Architecture:** The automatic parallel architecture has undergone further refinement to enhance the usability of the automatic parallel core mechanism and improve dynamic graph performance. The automatic parallel core mechanism has been improved, including the addition of multiple operator slicing derivation rules, support for distributed tensors being sliced along the same dimension by multiple mesh dimensions, and support for dynamic graph parallel strategies (PP, CP, SEP, TP-CONV), among others. Meanwhile, performance optimizations have been systematically implemented for the automatic parallel system of dynamic graphs, achieving performance that is essentially on par with manual parallelism on models such as Llama2, Qwen Baichuan, and others.
+- **Low-precision training:** Based on the blockwise fp8 gemm operator, it supports low-precision training, achieving training accuracy comparable to BF16, and speeding up large model training by 10-20%.
+- **Heterogeneous multi-core adaptation:** Provides a CUDA-like operator reuse mechanism, where users can simply register to use the corresponding kernel.
+- **Framework stability enhancement:** The system has fixed the calculation errors of operators in the cases of 0-Size and large dimensions.
-- **Unified Static and Dynamic Automatic Parallelism:** This feature significantly reduces the cost of industrial development and training. Users only need to perform a small amount of tensor slicing marking on a single card, and the PaddlePaddle framework will automatically derive the distributed slicing information and add communication operators to ensure logical correctness. At the same time, based on the model structure and cluster information, combined with the optimization of memory and scheduling layers, PaddlePaddle can automatically find the most efficient distributed parallel strategy, thereby significantly reducing the development cost of hybrid parallel training and enabling developers to focus more on model and algorithm innovation. The automatic parallel architecture has undergone in-depth verification and polishing to better support the pre-training + fine-tuning process for common large model scenarios such as pure dense models, pure sparse models (MoE), and multi-modal understanding models. It improves the slicing derivation rules of operators and supports converting automatic parallel training parameters into manual parallel parameters for downstream inference, achieving comprehensive usability and helping users reduce the development cost of large model parallel programs. Additionally, to further simplify the user's distributed development process, a new `paddle.distributed.parallel` interface is introduced. Based on the encapsulation of distributed tensor marking syntax, it supports users in non-intrusively configuring common parallel strategies such as data parallelism, model parallelism, and pipeline parallelism outside of the model networking. Furthermore, the static graph automatic parallel architecture has undergone a comprehensive upgrade based on PIR, with the underlying basic components, core modules, parallel strategies, and performance optimization strategies all implemented uniformly based on the extended PIR `DistDialect`, further enhancing the consistency of automatic parallelism between static and dynamic states, and achieving performance levels on the Llama series models that are on par with or even surpass manual parallel methods.
-- **Integrated Training and Inference for Large Models:** Since version 2.0, PaddlePaddle has adopted the design philosophy of "unified dynamic and static, integrated training and inference," and version 3.0 will continue to uphold this philosophy. Thanks to the unified dynamic and static architecture and interface design, PaddlePaddle fully supports both dynamic and static graph modes, and possesses excellent whole-graph export capabilities. The success rate of whole-graph export from dynamic to static in PaddlePaddle is as high as 95%, surpassing PyTorch's 62%. "Integrated training and inference" means being able to reuse training and inference code, especially model networking code, within the same framework. After completing the development and training of the model, only a small amount of development work is required to achieve rapid inference deployment. This feature provides an ultimate development experience for the industry. It enables the reuse of training and inference capabilities, providing a unified development experience and ultimate training efficiency for the entire process of large models. Through the work of transitioning from dynamic to static, the training and inference tasks can be seamlessly connected. It supports multiple mainstream large models, and the DeepSeek-R1 full-blood version achieves single-machine deployment with doubled throughput.
-- **High-order differential in scientific computing:** PaddlePaddle Framework 3.0 provides support for high-order automatic differentiation, compilation optimization, and distributed training capabilities for scientific computing. Experiments on 41 different equations on NVIDIA Modulus show that the differential equation solving speed of PaddlePaddle is on average 115% faster than the version of PyTorch with compiler optimization enabled. Additionally, PaddlePaddle has also established the PaddleScience toolkit for solving general mathematical problems and the PaddleHelix toolkit focused on biological computing. Furthermore, PaddlePaddle Framework 3.0 natively supports complex technology systems, which is of great significance for data feature analysis in scenarios such as weather forecasting and aerodynamic analysis of automobiles and aircraft.
-- **Neural Network Compiler:** This feature significantly reduces the cost of performance optimization. The compiler of PaddlePaddle adopts an integrated design with the framework, capable of supporting efficient training and variable-shape inference for various models such as generative models and scientific computing models, providing a good balance between computational flexibility and high performance. After using the CINN compiler, over 60% of the models have shown significant performance improvements, with an average increase of 27.4%. The CINN neural network compiler has comprehensive improvements in completeness and performance. In this version, we have comprehensively optimized the front-end and back-end aspects of the compiler: including adding an automatic Re-Compute mechanism for reverse computation graphs, front-end Pass performance optimization, upgrading the symbol derivation mechanism, optimizing operator fusion strategies, enhancing the back-end Schedule strategy and subscript expression simplification capabilities, etc. At the same time, we have investigated and fixed a large number of correctness and performance issues, systematically improving the general optimization capabilities of the compiler.
-- **Heterogeneous Multi-Chips Adaptation:** One of the key features of PaddlePaddle is its ability to adapt to heterogeneous multi-core environments and fully leverage hardware potential. In terms of access mechanism, PaddlePaddle provides simple and efficient abstract interfaces and a basic operator system, reducing the cost of adaptation. In terms of operation mechanism, it optimizes scheduling and storage sharing mechanisms, enhancing scheduling efficiency. From the perspective of operator kernels, PaddlePaddle offers a compiler-based automatic fusion and tuning solution to improve end-to-end performance. Additionally, PaddlePaddle has established research and development infrastructure for new hardware vendors, including code integration, continuous integration, and model regression testing. These mechanisms ensure that new hardware is incorporated into PaddlePaddle's normal release system, allowing users to install and try it directly without the need for compilation. PaddlePaddle's comprehensive functionality and low-cost access mechanism have attracted hardware vendors to contribute a total of 4001 pull requests (PRs), encompassing 26584 commits.
+## 1. User experience
-In addition to the above core features, **Highly Extensible Intermediate Representation** To enhance the scalability of the PaddlePaddle framework, we have developed the Highly Extensible Intermediate Representation (PIR), which systematically abstracts the underlying core concepts and provides flexible and efficient components. As an infrastructure, PIR supports multiple technologies such as dynamic-to-static, automatic differentiation, automatic parallelization, combinational operators, and graph optimization, and is widely used in distributed training, model compression, and inference deployment scenarios. Through the Declarative Rewrite Rule (DRR) mechanism provided by PIR, the development cost of Pass can be reduced by 60%. At the same time, PIR has been verified in all scenarios and is enabled by default, supporting one-click dynamic-to-static conversion, ensuring excellent performance and good scalability of the framework. Continuous improvements have been made to the existing functions of the framework version 2.0, and new features have brought significant improvements in user experience, performance, ease of secondary development, and hardware adaptability. This version continues to enrich and enhance the API functions to meet more scenarios at the user experience level. For large model scenarios, optimization and improvement have been made to the distributed parallel strategy optimization and inference function enhancement. Thorough usability improvements have been made in terms of compilation and installation, with a new synchronous upgrade of the installation method and version of dependent packages. Comprehensive reinforcement of system security has been carried out, and comprehensive error correction checks have been conducted on product documentation. At the same time, a large amount of cleanup has been done on some obsolete code to ensure the simplicity of the architecture.
-
-## Incompatible upgrade
-
-PaddlePaddle API supports implicit type promotion. In the most commonly used calculations such as addition, subtraction, multiplication, and division, if the data types of the two inputs are different, it is necessary to determine the data type of the output. Historically, PaddlePaddle has only partially supported implicit type promotion, and the actual rules are unclear. Objectively, this manifests as inconsistencies between dynamic and static graphs, inconsistencies between API and operator overloading, and non-compliance with commutativity. Especially when large models widely use mixed calculations with bf16/fp16 and fp32, unexpected issues are prone to occur and are difficult to locate. Starting from the 3.0 beta version, PaddlePaddle has clarified the [implicit data type promotion rules](https://www.paddlepaddle.org.cn/documentation/docs/zh/develop/guides/advanced/auto_type_promotion_cn.html), which defines in detail the types of calculation results for Tensor and Tensor, as well as Tensor and a scalar (Scalar), ensuring that calculations comply with commutativity, operator overloading is consistent with binary API results, and dynamic graphs and static graphs produce consistent results. This is more in line with user understanding and industry habits. [#60638](https://github.com/PaddlePaddle/Paddle/pull/60638), [#63842](https://github.com/PaddlePaddle/Paddle/pull/63842), [#60011](https://github.com/PaddlePaddle/Paddle/pull/60011)
-
-## Discontinued Features
-
-Support for 0-dimensional Tensor has been stable for two versions. In this version, the switch FLAGS_set_to_1d, which converts 0-dimensional Tensor to a 1-dimensional Tensor containing only one element in some cases, has been removed. This switch is to accommodate incorrect writing in some suites where 0-dimensional Tensor is represented by a 1-dimensional Tensor containing only one element. That is, PaddlePaddle now fully distinguishes between the semantics of 0-dimensional Tensor and 1-dimensional Tensor containing only one element, and the two are not equivalent. [#61227](https://github.com/PaddlePaddle/Paddle/pull/61227)
-
-## 1. User experience upgrade
+API enhancements, bug fixes, and improvements are aimed at enhancing user experience and improving the usability of the API. The `paddle.randn_like` API has been added, multiple functional defects in APIs have been fixed, and support for complex types and 0-Size Tensor has been enhanced. Documentation and code have also been updated and optimized accordingly to improve overall accuracy and professionalism.
### New Features
-- Added PaddlePaddle APIs to expand PaddlePaddle's functionality. These include `paddle.nn.FeatureAlphaDropout`, `paddle.cartesian_prod`, `paddle.distributed.to_distributed`, `paddle.pi`, etc. [#64881](https://github.com/PaddlePaddle/Paddle/pull/64881), [#65605](https://github.com/PaddlePaddle/Paddle/pull/65605), [#70757](https://github.com/PaddlePaddle/Paddle/pull/70757), [#71030](https://github.com/PaddlePaddle/Paddle/pull/71030), [#69946](https://github.com/PaddlePaddle/Paddle/pull/69946), [#70021](https://github.com/PaddlePaddle/Paddle/pull/70021), [#69613](https://github.com/PaddlePaddle/Paddle/pull/69613), [#68123](https://github.com/PaddlePaddle/Paddle/pull/68123), [#70032](https://github.com/PaddlePaddle/Paddle/pull/70032)
-- Introduce new Tensor class methods and attributes, along with corresponding unit tests, to enhance the usability of Tensor. [#68334](https://github.com/PaddlePaddle/Paddle/pull/68334), [#68681](https://github.com/PaddlePaddle/Paddle/pull/68681), [#69132](https://github.com/PaddlePaddle/Paddle/pull/69132), [#69270](https://github.com/PaddlePaddle/Paddle/pull/69270), [#69256](https://github.com/PaddlePaddle/Paddle/pull/69256), [#69197](https://github.com/PaddlePaddle/Paddle/pull/69197), [#69231](https://github.com/PaddlePaddle/Paddle/pull/69231), [#69222](https://github.com/PaddlePaddle/Paddle/pull/69222), [#69257](https://github.com/PaddlePaddle/Paddle/pull/69257), [#69301](https://github.com/PaddlePaddle/Paddle/pull/69301), [#69361](https://github.com/PaddlePaddle/Paddle/pull/69361), [#69348](https://github.com/PaddlePaddle/Paddle/pull/69348), [#69464](https://github.com/PaddlePaddle/Paddle/pull/69464), [#69542](https://github.com/PaddlePaddle/Paddle/pull/69542), [#69667](https://github.com/PaddlePaddle/Paddle/pull/69667), [#69563](https://github.com/PaddlePaddle/Paddle/pull/69563), [#69796](https://github.com/PaddlePaddle/Paddle/pull/69796), [#69477](https://github.com/PaddlePaddle/Paddle/pull/69477), [#69779](https://github.com/PaddlePaddle/Paddle/pull/69779), [#69724](https://github.com/PaddlePaddle/Paddle/pull/69724), [#69835](https://github.com/PaddlePaddle/Paddle/pull/69835), [#69781](https://github.com/PaddlePaddle/Paddle/pull/69781), [#69982](https://github.com/PaddlePaddle/Paddle/pull/69982), [#69913](https://github.com/PaddlePaddle/Paddle/pull/69913), [#70026](https://github.com/PaddlePaddle/Paddle/pull/70026), [#70013](https://github.com/PaddlePaddle/Paddle/pull/70013), [#69539](https://github.com/PaddlePaddle/Paddle/pull/69539), [#69736](https://github.com/PaddlePaddle/Paddle/pull/69736), [#69841](https://github.com/PaddlePaddle/Paddle/pull/69841), [#70277](https://github.com/PaddlePaddle/Paddle/pull/70277), [#69580](https://github.com/PaddlePaddle/Paddle/pull/69580), [#69599](https://github.com/PaddlePaddle/Paddle/pull/69599), [#69693](https://github.com/PaddlePaddle/Paddle/pull/69693), [#69848](https://github.com/PaddlePaddle/Paddle/pull/69848), [#69751](https://github.com/PaddlePaddle/Paddle/pull/69751), [#70556](https://github.com/PaddlePaddle/Paddle/pull/70556), [#70591](https://github.com/PaddlePaddle/Paddle/pull/70591), [#69673](https://github.com/PaddlePaddle/Paddle/pull/69673), [#70647](https://github.com/PaddlePaddle/Paddle/pull/70647), [#68192](https://github.com/PaddlePaddle/Paddle/pull/68192), [#68511](https://github.com/PaddlePaddle/Paddle/pull/68511), [#68833](https://github.com/PaddlePaddle/Paddle/pull/68833), [#69406](https://github.com/PaddlePaddle/Paddle/pull/69406), [#69480](https://github.com/PaddlePaddle/Paddle/pull/69480), [#69463](https://github.com/PaddlePaddle/Paddle/pull/69463), [#69632](https://github.com/PaddlePaddle/Paddle/pull/69632), [#69473](https://github.com/PaddlePaddle/Paddle/pull/69473), [#68694](https://github.com/PaddlePaddle/Paddle/pull/68694), [#69534](https://github.com/PaddlePaddle/Paddle/pull/69534), [#69820](https://github.com/PaddlePaddle/Paddle/pull/69820), [#70121](https://github.com/PaddlePaddle/Paddle/pull/70121)
-
-### API Function Enhancement
-
-- Enhanced the functionality of 43 APIs, making existing APIs easier to use and facilitating code conversion. This includes but is not limited to adding API parameters, expanding the data types supported by APIs, and correcting existing unreasonable designs. [#65105](https://github.com/PaddlePaddle/Paddle/pull/65105), [#65103](https://github.com/PaddlePaddle/Paddle/pull/65103), [#62975](https://github.com/PaddlePaddle/Paddle/pull/62975), [#64436](https://github.com/PaddlePaddle/Paddle/pull/64436), [#63346](https://github.com/PaddlePaddle/Paddle/pull/63346), [#68079](https://github.com/PaddlePaddle/Paddle/pull/68079), [#67878](https://github.com/PaddlePaddle/Paddle/pull/67878), [#68432](https://github.com/PaddlePaddle/Paddle/pull/68432), [#68677](https://github.com/PaddlePaddle/Paddle/pull/68677), [#69012](https://github.com/PaddlePaddle/Paddle/pull/69012), [#69385](https://github.com/PaddlePaddle/Paddle/pull/69385), [#65032](https://github.com/PaddlePaddle/Paddle/pull/65032), [#64977](https://github.com/PaddlePaddle/Paddle/pull/64977), [#67071](https://github.com/PaddlePaddle/Paddle/pull/67071), [#67298](https://github.com/PaddlePaddle/Paddle/pull/67298), [#66687](https://github.com/PaddlePaddle/Paddle/pull/66687), [#65946](https://github.com/PaddlePaddle/Paddle/pull/65946), [#66170](https://github.com/PaddlePaddle/Paddle/pull/66170), [#66929](https://github.com/PaddlePaddle/Paddle/pull/66929), [#67994](https://github.com/PaddlePaddle/Paddle/pull/67994), [#67947](https://github.com/PaddlePaddle/Paddle/pull/67947), [#68033](https://github.com/PaddlePaddle/Paddle/pull/68033), [#68046](https://github.com/PaddlePaddle/Paddle/pull/68046), [#68294](https://github.com/PaddlePaddle/Paddle/pull/68294), [#68214](https://github.com/PaddlePaddle/Paddle/pull/68214), [#68281](https://github.com/PaddlePaddle/Paddle/pull/68281), [#68390](https://github.com/PaddlePaddle/Paddle/pull/68390), [#68772](https://github.com/PaddlePaddle/Paddle/pull/68772), [#69451](https://github.com/PaddlePaddle/Paddle/pull/69451), [#69252](https://github.com/PaddlePaddle/Paddle/pull/69252), [#69529](https://github.com/PaddlePaddle/Paddle/pull/69529), [#69750](https://github.com/PaddlePaddle/Paddle/pull/69750), [#69827](https://github.com/PaddlePaddle/Paddle/pull/69827), [#69099](https://github.com/PaddlePaddle/Paddle/pull/69099), [#68594](https://github.com/PaddlePaddle/Paddle/pull/68594), [#70090](https://github.com/PaddlePaddle/Paddle/pull/70090), [#70228](https://github.com/PaddlePaddle/Paddle/pull/70228), [#70166](https://github.com/PaddlePaddle/Paddle/pull/70166), [#70389](https://github.com/PaddlePaddle/Paddle/pull/70389), [#70790](https://github.com/PaddlePaddle/Paddle/pull/70790), [#71029](https://github.com/PaddlePaddle/Paddle/pull/71029), [#71283](https://github.com/PaddlePaddle/Paddle/pull/71283), [#71342](https://github.com/PaddlePaddle/Paddle/pull/71342)
-- PaddlePaddle Python API fully supports type hints. All parameters and return values of Python API have been annotated with type hints for ease of development and use. [#65209](https://github.com/PaddlePaddle/Paddle/pull/65209), [#65201](https://github.com/PaddlePaddle/Paddle/pull/65201), [#65190](https://github.com/PaddlePaddle/Paddle/pull/65190), [#65082](https://github.com/PaddlePaddle/Paddle/pull/65082), [#65226](https://github.com/PaddlePaddle/Paddle/pull/65226), [#65076](https://github.com/PaddlePaddle/Paddle/pull/65076), [#65238](https://github.com/PaddlePaddle/Paddle/pull/65238), [#65236](https://github.com/PaddlePaddle/Paddle/pull/65236), [#65247](https://github.com/PaddlePaddle/Paddle/pull/65247), [#65249](https://github.com/PaddlePaddle/Paddle/pull/65249), [#65244](https://github.com/PaddlePaddle/Paddle/pull/65244), [#65272](https://github.com/PaddlePaddle/Paddle/pull/65272), [#65191](https://github.com/PaddlePaddle/Paddle/pull/65191), [#65290](https://github.com/PaddlePaddle/Paddle/pull/65290), [#65255](https://github.com/PaddlePaddle/Paddle/pull/65255), [#65292](https://github.com/PaddlePaddle/Paddle/pull/65292), [#65300](https://github.com/PaddlePaddle/Paddle/pull/65300), [#65301](https://github.com/PaddlePaddle/Paddle/pull/65301), [#65332](https://github.com/PaddlePaddle/Paddle/pull/65332), [#65323](https://github.com/PaddlePaddle/Paddle/pull/65323), [#65326](https://github.com/PaddlePaddle/Paddle/pull/65326), [#65273](https://github.com/PaddlePaddle/Paddle/pull/65273), [#65317](https://github.com/PaddlePaddle/Paddle/pull/65317), [#65354](https://github.com/PaddlePaddle/Paddle/pull/65354), [#65283](https://github.com/PaddlePaddle/Paddle/pull/65283), [#65372](https://github.com/PaddlePaddle/Paddle/pull/65372), [#65337](https://github.com/PaddlePaddle/Paddle/pull/65337), [#65085](https://github.com/PaddlePaddle/Paddle/pull/65085), [#65382](https://github.com/PaddlePaddle/Paddle/pull/65382), [#65381](https://github.com/PaddlePaddle/Paddle/pull/65381), [#65378](https://github.com/PaddlePaddle/Paddle/pull/65378), [#65274](https://github.com/PaddlePaddle/Paddle/pull/65274), [#65380](https://github.com/PaddlePaddle/Paddle/pull/65380), [#65386](https://github.com/PaddlePaddle/Paddle/pull/65386), [#65351](https://github.com/PaddlePaddle/Paddle/pull/65351), [#65284](https://github.com/PaddlePaddle/Paddle/pull/65284), [#65366](https://github.com/PaddlePaddle/Paddle/pull/65366), [#65308](https://github.com/PaddlePaddle/Paddle/pull/65308), [#65375](https://github.com/PaddlePaddle/Paddle/pull/65375), [#65376](https://github.com/PaddlePaddle/Paddle/pull/65376), [#65464](https://github.com/PaddlePaddle/Paddle/pull/65464), [#65197](https://github.com/PaddlePaddle/Paddle/pull/65197), [#65455](https://github.com/PaddlePaddle/Paddle/pull/65455), [#65457](https://github.com/PaddlePaddle/Paddle/pull/65457), [#65487](https://github.com/PaddlePaddle/Paddle/pull/65487), [#65486](https://github.com/PaddlePaddle/Paddle/pull/65486), [#65547](https://github.com/PaddlePaddle/Paddle/pull/65547), [#65504](https://github.com/PaddlePaddle/Paddle/pull/65504), [#65460](https://github.com/PaddlePaddle/Paddle/pull/65460), [#65183](https://github.com/PaddlePaddle/Paddle/pull/65183), [#65454](https://github.com/PaddlePaddle/Paddle/pull/65454), [#65559](https://github.com/PaddlePaddle/Paddle/pull/65559), [#65560](https://github.com/PaddlePaddle/Paddle/pull/65560), [#65570](https://github.com/PaddlePaddle/Paddle/pull/65570), [#65569](https://github.com/PaddlePaddle/Paddle/pull/65569), [#65566](https://github.com/PaddlePaddle/Paddle/pull/65566), [#65620](https://github.com/PaddlePaddle/Paddle/pull/65620), [#65568](https://github.com/PaddlePaddle/Paddle/pull/65568), [#65567](https://github.com/PaddlePaddle/Paddle/pull/65567), [#65660](https://github.com/PaddlePaddle/Paddle/pull/65660), [#65645](https://github.com/PaddlePaddle/Paddle/pull/65645), [#65600](https://github.com/PaddlePaddle/Paddle/pull/65600), [#65532](https://github.com/PaddlePaddle/Paddle/pull/65532), [#65765](https://github.com/PaddlePaddle/Paddle/pull/65765), [#65767](https://github.com/PaddlePaddle/Paddle/pull/65767), [#65770](https://github.com/PaddlePaddle/Paddle/pull/65770), [#65768](https://github.com/PaddlePaddle/Paddle/pull/65768), [#65771](https://github.com/PaddlePaddle/Paddle/pull/65771), [#65772](https://github.com/PaddlePaddle/Paddle/pull/65772), [#65774](https://github.com/PaddlePaddle/Paddle/pull/65774), [#65769](https://github.com/PaddlePaddle/Paddle/pull/65769), [#65773](https://github.com/PaddlePaddle/Paddle/pull/65773), [#65766](https://github.com/PaddlePaddle/Paddle/pull/65766), [#65776](https://github.com/PaddlePaddle/Paddle/pull/65776), [#65775](https://github.com/PaddlePaddle/Paddle/pull/65775), [#65755](https://github.com/PaddlePaddle/Paddle/pull/65755), [#65779](https://github.com/PaddlePaddle/Paddle/pull/65779), [#65777](https://github.com/PaddlePaddle/Paddle/pull/65777), [#65823](https://github.com/PaddlePaddle/Paddle/pull/65823), [#65807](https://github.com/PaddlePaddle/Paddle/pull/65807), [#65821](https://github.com/PaddlePaddle/Paddle/pull/65821), [#65819](https://github.com/PaddlePaddle/Paddle/pull/65819), [#65810](https://github.com/PaddlePaddle/Paddle/pull/65810), [#65808](https://github.com/PaddlePaddle/Paddle/pull/65808), [#65824](https://github.com/PaddlePaddle/Paddle/pull/65824), [#65553](https://github.com/PaddlePaddle/Paddle/pull/65553), [#65818](https://github.com/PaddlePaddle/Paddle/pull/65818), [#65812](https://github.com/PaddlePaddle/Paddle/pull/65812), [#65803](https://github.com/PaddlePaddle/Paddle/pull/65803), [#65865](https://github.com/PaddlePaddle/Paddle/pull/65865), [#65870](https://github.com/PaddlePaddle/Paddle/pull/65870), [#65866](https://github.com/PaddlePaddle/Paddle/pull/65866), [#65844](https://github.com/PaddlePaddle/Paddle/pull/65844), [#65845](https://github.com/PaddlePaddle/Paddle/pull/65845), [#65853](https://github.com/PaddlePaddle/Paddle/pull/65853), [#65874](https://github.com/PaddlePaddle/Paddle/pull/65874), [#65871](https://github.com/PaddlePaddle/Paddle/pull/65871), [#65809](https://github.com/PaddlePaddle/Paddle/pull/65809), [#65867](https://github.com/PaddlePaddle/Paddle/pull/65867), [#65822](https://github.com/PaddlePaddle/Paddle/pull/65822), [#65872](https://github.com/PaddlePaddle/Paddle/pull/65872), [#65873](https://github.com/PaddlePaddle/Paddle/pull/65873), [#65869](https://github.com/PaddlePaddle/Paddle/pull/65869), [#65868](https://github.com/PaddlePaddle/Paddle/pull/65868), [#65849](https://github.com/PaddlePaddle/Paddle/pull/65849), [#65875](https://github.com/PaddlePaddle/Paddle/pull/65875), [#65876](https://github.com/PaddlePaddle/Paddle/pull/65876), [#65843](https://github.com/PaddlePaddle/Paddle/pull/65843), [#65727](https://github.com/PaddlePaddle/Paddle/pull/65727), [#65587](https://github.com/PaddlePaddle/Paddle/pull/65587), [#66006](https://github.com/PaddlePaddle/Paddle/pull/66006), [#66005](https://github.com/PaddlePaddle/Paddle/pull/66005), [#65785](https://github.com/PaddlePaddle/Paddle/pull/65785), [#65784](https://github.com/PaddlePaddle/Paddle/pull/65784), [#65811](https://github.com/PaddlePaddle/Paddle/pull/65811), [#65919](https://github.com/PaddlePaddle/Paddle/pull/65919), [#65838](https://github.com/PaddlePaddle/Paddle/pull/65838), [#65852](https://github.com/PaddlePaddle/Paddle/pull/65852), [#65847](https://github.com/PaddlePaddle/Paddle/pull/65847), [#66014](https://github.com/PaddlePaddle/Paddle/pull/66014), [#65805](https://github.com/PaddlePaddle/Paddle/pull/65805), [#66009](https://github.com/PaddlePaddle/Paddle/pull/66009), [#66012](https://github.com/PaddlePaddle/Paddle/pull/66012), [#65633](https://github.com/PaddlePaddle/Paddle/pull/65633), [#66011](https://github.com/PaddlePaddle/Paddle/pull/66011), [#66010](https://github.com/PaddlePaddle/Paddle/pull/66010), [#66013](https://github.com/PaddlePaddle/Paddle/pull/66013), [#66015](https://github.com/PaddlePaddle/Paddle/pull/66015), [#66016](https://github.com/PaddlePaddle/Paddle/pull/66016), [#66030](https://github.com/PaddlePaddle/Paddle/pull/66030), [#66028](https://github.com/PaddlePaddle/Paddle/pull/66028), [#66029](https://github.com/PaddlePaddle/Paddle/pull/66029), [#66054](https://github.com/PaddlePaddle/Paddle/pull/66054), [#66040](https://github.com/PaddlePaddle/Paddle/pull/66040), [#65993](https://github.com/PaddlePaddle/Paddle/pull/65993), [#66058](https://github.com/PaddlePaddle/Paddle/pull/66058), [#66280](https://github.com/PaddlePaddle/Paddle/pull/66280), [#66037](https://github.com/PaddlePaddle/Paddle/pull/66037), [#66057](https://github.com/PaddlePaddle/Paddle/pull/66057), [#66077](https://github.com/PaddlePaddle/Paddle/pull/66077), [#66051](https://github.com/PaddlePaddle/Paddle/pull/66051), [#65912](https://github.com/PaddlePaddle/Paddle/pull/65912), [#66090](https://github.com/PaddlePaddle/Paddle/pull/66090), [#66189](https://github.com/PaddlePaddle/Paddle/pull/66189), [#66127](https://github.com/PaddlePaddle/Paddle/pull/66127), [#66277](https://github.com/PaddlePaddle/Paddle/pull/66277), [#66119](https://github.com/PaddlePaddle/Paddle/pull/66119), [#66270](https://github.com/PaddlePaddle/Paddle/pull/66270), [#66305](https://github.com/PaddlePaddle/Paddle/pull/66305), [#66306](https://github.com/PaddlePaddle/Paddle/pull/66306), [#66279](https://github.com/PaddlePaddle/Paddle/pull/66279), [#66276](https://github.com/PaddlePaddle/Paddle/pull/66276), [#66295](https://github.com/PaddlePaddle/Paddle/pull/66295), [#66301](https://github.com/PaddlePaddle/Paddle/pull/66301), [#66473](https://github.com/PaddlePaddle/Paddle/pull/66473), [#66384](https://github.com/PaddlePaddle/Paddle/pull/66384), [#66505](https://github.com/PaddlePaddle/Paddle/pull/66505), [#66328](https://github.com/PaddlePaddle/Paddle/pull/66328), [#66394](https://github.com/PaddlePaddle/Paddle/pull/66394), [#66392](https://github.com/PaddlePaddle/Paddle/pull/66392), [#66432](https://github.com/PaddlePaddle/Paddle/pull/66432), [#66575](https://github.com/PaddlePaddle/Paddle/pull/66575), [#66572](https://github.com/PaddlePaddle/Paddle/pull/66572), [#66656](https://github.com/PaddlePaddle/Paddle/pull/66656), [#66475](https://github.com/PaddlePaddle/Paddle/pull/66475), [#66654](https://github.com/PaddlePaddle/Paddle/pull/66654), [#66616](https://github.com/PaddlePaddle/Paddle/pull/66616), [#66694](https://github.com/PaddlePaddle/Paddle/pull/66694), [#66686](https://github.com/PaddlePaddle/Paddle/pull/66686), [#66766](https://github.com/PaddlePaddle/Paddle/pull/66766), [#66749](https://github.com/PaddlePaddle/Paddle/pull/66749), [#66760](https://github.com/PaddlePaddle/Paddle/pull/66760), [#66803](https://github.com/PaddlePaddle/Paddle/pull/66803), [#66770](https://github.com/PaddlePaddle/Paddle/pull/66770), [#66693](https://github.com/PaddlePaddle/Paddle/pull/66693), [#66771](https://github.com/PaddlePaddle/Paddle/pull/66771), [#66792](https://github.com/PaddlePaddle/Paddle/pull/66792), [#66862](https://github.com/PaddlePaddle/Paddle/pull/66862), [#66867](https://github.com/PaddlePaddle/Paddle/pull/66867), [#66684](https://github.com/PaddlePaddle/Paddle/pull/66684), [#66966](https://github.com/PaddlePaddle/Paddle/pull/66966), [#66793](https://github.com/PaddlePaddle/Paddle/pull/66793), [#66987](https://github.com/PaddlePaddle/Paddle/pull/66987), [#66985](https://github.com/PaddlePaddle/Paddle/pull/66985), [#66989](https://github.com/PaddlePaddle/Paddle/pull/66989), [#66639](https://github.com/PaddlePaddle/Paddle/pull/66639), [#66994](https://github.com/PaddlePaddle/Paddle/pull/66994), [#66986](https://github.com/PaddlePaddle/Paddle/pull/66986), [#66993](https://github.com/PaddlePaddle/Paddle/pull/66993), [#67002](https://github.com/PaddlePaddle/Paddle/pull/67002), [#66996](https://github.com/PaddlePaddle/Paddle/pull/66996), [#67001](https://github.com/PaddlePaddle/Paddle/pull/67001), [#66864](https://github.com/PaddlePaddle/Paddle/pull/66864), [#67031](https://github.com/PaddlePaddle/Paddle/pull/67031), [#67089](https://github.com/PaddlePaddle/Paddle/pull/67089), [#67143](https://github.com/PaddlePaddle/Paddle/pull/67143), [#67179](https://github.com/PaddlePaddle/Paddle/pull/67179), [#67178](https://github.com/PaddlePaddle/Paddle/pull/67178), [#67284](https://github.com/PaddlePaddle/Paddle/pull/67284), [#67104](https://github.com/PaddlePaddle/Paddle/pull/67104), [#67079](https://github.com/PaddlePaddle/Paddle/pull/67079), [#67132](https://github.com/PaddlePaddle/Paddle/pull/67132), [#67147](https://github.com/PaddlePaddle/Paddle/pull/67147), [#67204](https://github.com/PaddlePaddle/Paddle/pull/67204), [#67112](https://github.com/PaddlePaddle/Paddle/pull/67112), [#67233](https://github.com/PaddlePaddle/Paddle/pull/67233), [#67366](https://github.com/PaddlePaddle/Paddle/pull/67366), [#67067](https://github.com/PaddlePaddle/Paddle/pull/67067), [#67391](https://github.com/PaddlePaddle/Paddle/pull/67391), [#67428](https://github.com/PaddlePaddle/Paddle/pull/67428), [#67197](https://github.com/PaddlePaddle/Paddle/pull/67197), [#67047](https://github.com/PaddlePaddle/Paddle/pull/67047), [#66890](https://github.com/PaddlePaddle/Paddle/pull/66890), [#67159](https://github.com/PaddlePaddle/Paddle/pull/67159), [#67439](https://github.com/PaddlePaddle/Paddle/pull/67439), [#67555](https://github.com/PaddlePaddle/Paddle/pull/67555), [#67448](https://github.com/PaddlePaddle/Paddle/pull/67448), [#67556](https://github.com/PaddlePaddle/Paddle/pull/67556), [#67469](https://github.com/PaddlePaddle/Paddle/pull/67469), [#67558](https://github.com/PaddlePaddle/Paddle/pull/67558), [#67405](https://github.com/PaddlePaddle/Paddle/pull/67405), [#67644](https://github.com/PaddlePaddle/Paddle/pull/67644), [#67624](https://github.com/PaddlePaddle/Paddle/pull/67624), [#67679](https://github.com/PaddlePaddle/Paddle/pull/67679), [#67677](https://github.com/PaddlePaddle/Paddle/pull/67677), [#67785](https://github.com/PaddlePaddle/Paddle/pull/67785), [#67767](https://github.com/PaddlePaddle/Paddle/pull/67767), [#65319](https://github.com/PaddlePaddle/Paddle/pull/65319), [#65277](https://github.com/PaddlePaddle/Paddle/pull/65277), [#67673](https://github.com/PaddlePaddle/Paddle/pull/67673), [#65557](https://github.com/PaddlePaddle/Paddle/pull/65557), [#67527](https://github.com/PaddlePaddle/Paddle/pull/67527), [#66965](https://github.com/PaddlePaddle/Paddle/pull/66965), [#65905](https://github.com/PaddlePaddle/Paddle/pull/65905), [#65657](https://github.com/PaddlePaddle/Paddle/pull/65657), [#66357](https://github.com/PaddlePaddle/Paddle/pull/66357), [#68163](https://github.com/PaddlePaddle/Paddle/pull/68163)
-- Optimized the error messages of many PaddlePaddle APIs, making the errors more understandable. [#67148](https://github.com/PaddlePaddle/Paddle/pull/67148), [#67154](https://github.com/PaddlePaddle/Paddle/pull/67154), [#67546](https://github.com/PaddlePaddle/Paddle/pull/67546), [#67335](https://github.com/PaddlePaddle/Paddle/pull/67335), [#67255](https://github.com/PaddlePaddle/Paddle/pull/67255), [#67099](https://github.com/PaddlePaddle/Paddle/pull/67099), [#67074](https://github.com/PaddlePaddle/Paddle/pull/67074), [#67073](https://github.com/PaddlePaddle/Paddle/pull/67073), [#66957](https://github.com/PaddlePaddle/Paddle/pull/66957), [#67063](https://github.com/PaddlePaddle/Paddle/pull/67063), [#67575](https://github.com/PaddlePaddle/Paddle/pull/67575), [#67608](https://github.com/PaddlePaddle/Paddle/pull/67608), [#67634](https://github.com/PaddlePaddle/Paddle/pull/67634), [#67325](https://github.com/PaddlePaddle/Paddle/pull/67325), [#67429](https://github.com/PaddlePaddle/Paddle/pull/67429), [#67401](https://github.com/PaddlePaddle/Paddle/pull/67401), [#66881](https://github.com/PaddlePaddle/Paddle/pull/66881), [#68492](https://github.com/PaddlePaddle/Paddle/pull/68492), [#67695](https://github.com/PaddlePaddle/Paddle/pull/67695), [#69833](https://github.com/PaddlePaddle/Paddle/pull/69833), [#70398](https://github.com/PaddlePaddle/Paddle/pull/70398)
+- Added `paddle.randn_like` API. [#72492](https://github.com/PaddlePaddle/Paddle/pull/72492)
### Bug Fixes
-- Fixed a bug in `paddle.nn.functional.max_unpool1d` when the input `output_size` is a tuple. [#65910](https://github.com/PaddlePaddle/Paddle/pull/65910)
-- Fixed the issue where `paddle.base.core.eager.Tensor` did not support paddle::DataType. [#66765](https://github.com/PaddlePaddle/Paddle/pull/66765)
-- Fixed the issue where an error occurred during BF16 training when the pir switch was turned on. [#66833](https://github.com/PaddlePaddle/Paddle/pull/66833)
-- Fixed the issue of bias in the linear layer during parallel processing in the pipeline. [#67212](https://github.com/PaddlePaddle/Paddle/pull/67212)
-- Fixed the error issue when using loss for judgment in parallel pipeline. [#66980](https://github.com/PaddlePaddle/Paddle/pull/66980)
-- Fixed the error issue when using `paddle.Tensor.item` in parallel pipeline. [#67441](https://github.com/PaddlePaddle/Paddle/pull/67441)
-- Fixed bugs in `paddle.einsum` in specific scenarios. [#67588](https://github.com/PaddlePaddle/Paddle/pull/67588)
-- Fixed the error issue of `paddle.nn.SyncBatchNorm` during gradient computation. [#67559](https://github.com/PaddlePaddle/Paddle/pull/67559)
-- Fixed the issue mentioned in [issue #69992](https://github.com/PaddlePaddle/Paddle/issues/69992). [#70017](https://github.com/PaddlePaddle/Paddle/pull/70017)
-- Fixed the issue where `paddle.arange` produced incorrect results when dealing with large integers. [#70188](https://github.com/PaddlePaddle/Paddle/pull/70188)
-- Fixed the issue where `paddle.max` and `paddle.min` propagated incorrectly when there were nan values in the input. [#70049](https://github.com/PaddlePaddle/Paddle/pull/70049)
-- Fixed issues with APIs such as `paddle.linalg.svd` and `paddle.linalg.any` when handling 0-size Tensor. [#70235](https://github.com/PaddlePaddle/Paddle/pull/70235), [#70489](https://github.com/PaddlePaddle/Paddle/pull/70489), [#70047](https://github.com/PaddlePaddle/Paddle/pull/70047), [#70103](https://github.com/PaddlePaddle/Paddle/pull/70103), [#70127](https://github.com/PaddlePaddle/Paddle/pull/70127), [#70098](https://github.com/PaddlePaddle/Paddle/pull/70098), [#70077](https://github.com/PaddlePaddle/Paddle/pull/70077), [#70130](https://github.com/PaddlePaddle/Paddle/pull/70130), [#70254](https://github.com/PaddlePaddle/Paddle/pull/70254), [#70125](https://github.com/PaddlePaddle/Paddle/pull/70125), [#70342](https://github.com/PaddlePaddle/Paddle/pull/70342), [#70369](https://github.com/PaddlePaddle/Paddle/pull/70369), [#71094](https://github.com/PaddlePaddle/Paddle/pull/71094), [#71089](https://github.com/PaddlePaddle/Paddle/pull/71089), [#71185](https://github.com/PaddlePaddle/Paddle/pull/71185), [#70537](https://github.com/PaddlePaddle/Paddle/pull/70537), [#70481](https://github.com/PaddlePaddle/Paddle/pull/70481)
-- Fixed some issues with type hint annotations and documentation issues. [#65429](https://github.com/PaddlePaddle/Paddle/pull/65429), [#65496](https://github.com/PaddlePaddle/Paddle/pull/65496), [#65461](https://github.com/PaddlePaddle/Paddle/pull/65461), [#65542](https://github.com/PaddlePaddle/Paddle/pull/65542), [#65575](https://github.com/PaddlePaddle/Paddle/pull/65575), [#65545](https://github.com/PaddlePaddle/Paddle/pull/65545), [#65609](https://github.com/PaddlePaddle/Paddle/pull/65609), [#65644](https://github.com/PaddlePaddle/Paddle/pull/65644), [#65700](https://github.com/PaddlePaddle/Paddle/pull/65700), [#65697](https://github.com/PaddlePaddle/Paddle/pull/65697), [#65719](https://github.com/PaddlePaddle/Paddle/pull/65719), [#65639](https://github.com/PaddlePaddle/Paddle/pull/65639), [#65742](https://github.com/PaddlePaddle/Paddle/pull/65742), [#65891](https://github.com/PaddlePaddle/Paddle/pull/65891), [#65877](https://github.com/PaddlePaddle/Paddle/pull/65877), [#65895](https://github.com/PaddlePaddle/Paddle/pull/65895), [#66007](https://github.com/PaddlePaddle/Paddle/pull/66007), [#66679](https://github.com/PaddlePaddle/Paddle/pull/66679), [#66680](https://github.com/PaddlePaddle/Paddle/pull/66680), [#66676](https://github.com/PaddlePaddle/Paddle/pull/66676), [#66677](https://github.com/PaddlePaddle/Paddle/pull/66677), [#66884](https://github.com/PaddlePaddle/Paddle/pull/66884), [#67288](https://github.com/PaddlePaddle/Paddle/pull/67288), [#67302](https://github.com/PaddlePaddle/Paddle/pull/67302), [#66978](https://github.com/PaddlePaddle/Paddle/pull/66978), [#67295](https://github.com/PaddlePaddle/Paddle/pull/67295), [#67520](https://github.com/PaddlePaddle/Paddle/pull/67520), [#67421](https://github.com/PaddlePaddle/Paddle/pull/67421), [#67529](https://github.com/PaddlePaddle/Paddle/pull/67529), [#67536](https://github.com/PaddlePaddle/Paddle/pull/67536), [#67618](https://github.com/PaddlePaddle/Paddle/pull/67618), [#67661](https://github.com/PaddlePaddle/Paddle/pull/67661), [#67698](https://github.com/PaddlePaddle/Paddle/pull/67698), [#67800](https://github.com/PaddlePaddle/Paddle/pull/67800), [#67933](https://github.com/PaddlePaddle/Paddle/pull/67933), [#67893](https://github.com/PaddlePaddle/Paddle/pull/67893), [#68108](https://github.com/PaddlePaddle/Paddle/pull/68108), [#67927](https://github.com/PaddlePaddle/Paddle/pull/67927), [#68322](https://github.com/PaddlePaddle/Paddle/pull/68322), [#68341](https://github.com/PaddlePaddle/Paddle/pull/68341), [#68415](https://github.com/PaddlePaddle/Paddle/pull/68415), [#68372](https://github.com/PaddlePaddle/Paddle/pull/68372), [#68559](https://github.com/PaddlePaddle/Paddle/pull/68559), [#68598](https://github.com/PaddlePaddle/Paddle/pull/68598), [#68708](https://github.com/PaddlePaddle/Paddle/pull/68708), [#68780](https://github.com/PaddlePaddle/Paddle/pull/68780), [#68992](https://github.com/PaddlePaddle/Paddle/pull/68992), [#68989](https://github.com/PaddlePaddle/Paddle/pull/68989), [#68895](https://github.com/PaddlePaddle/Paddle/pull/68895), [#69014](https://github.com/PaddlePaddle/Paddle/pull/69014), [#69139](https://github.com/PaddlePaddle/Paddle/pull/69139), [#68996](https://github.com/PaddlePaddle/Paddle/pull/68996), [#69090](https://github.com/PaddlePaddle/Paddle/pull/69090), [#68922](https://github.com/PaddlePaddle/Paddle/pull/68922), [#69333](https://github.com/PaddlePaddle/Paddle/pull/69333), [#69141](https://github.com/PaddlePaddle/Paddle/pull/69141), [#69609](https://github.com/PaddlePaddle/Paddle/pull/69609), [#69652](https://github.com/PaddlePaddle/Paddle/pull/69652), [#69715](https://github.com/PaddlePaddle/Paddle/pull/69715), [#69716](https://github.com/PaddlePaddle/Paddle/pull/69716), [#69934](https://github.com/PaddlePaddle/Paddle/pull/69934), [#70253](https://github.com/PaddlePaddle/Paddle/pull/70253), [#70297](https://github.com/PaddlePaddle/Paddle/pull/70297), [#70252](https://github.com/PaddlePaddle/Paddle/pull/70252), [#70468](https://github.com/PaddlePaddle/Paddle/pull/70468), [#70102](https://github.com/PaddlePaddle/Paddle/pull/70102), [#70546](https://github.com/PaddlePaddle/Paddle/pull/70546), [#70616](https://github.com/PaddlePaddle/Paddle/pull/70616), [#70582](https://github.com/PaddlePaddle/Paddle/pull/70582), [#70635](https://github.com/PaddlePaddle/Paddle/pull/70635), [#70499](https://github.com/PaddlePaddle/Paddle/pull/70499), [#70755](https://github.com/PaddlePaddle/Paddle/pull/70755), [#70935](https://github.com/PaddlePaddle/Paddle/pull/70935), [#71133](https://github.com/PaddlePaddle/Paddle/pull/71133), [#71172](https://github.com/PaddlePaddle/Paddle/pull/71172), [#71238](https://github.com/PaddlePaddle/Paddle/pull/71238), [#71230](https://github.com/PaddlePaddle/Paddle/pull/71230), [#71394](https://github.com/PaddlePaddle/Paddle/pull/71394)
+- Fixed the issue of inconsistent input and output types in the `tensordot` API. [#72139](https://github.com/PaddlePaddle/Paddle/pull/72139)
+- Fixed the issue where the output of the `atleast` API was a Tensor list. [#73102](https://github.com/PaddlePaddle/Paddle/pull/73102)
+- Fixed the issue with the `nonzer` API. [#72003](https://github.com/PaddlePaddle/Paddle/pull/72003)
+- Fixed the memory leak issue in `dualpipev`. [#72070](https://github.com/PaddlePaddle/Paddle/pull/72070)
+- Fixed the overflow issue in `softmax` calculation. [#71935](https://github.com/PaddlePaddle/Paddle/pull/71935)
+- Fixed the shape checking issue in `take_along_axis` when `broadcast=False`. [#72436](https://github.com/PaddlePaddle/Paddle/pull/72436)
+- Fixed the incorrect handling of Nan input in `maximum` and `minimum` functions. [#71933](https://github.com/PaddlePaddle/Paddle/pull/71933)
+- Fixed the issue with `visit_type`. [#72782](https://github.com/PaddlePaddle/Paddle/pull/72782)
+- Fixed the int32 out-of-bounds issue in `gather_scatter_functor`. [#72905](https://github.com/PaddlePaddle/Paddle/pull/72905)
+- Fixed the inplace implementation of `Bernoulli` in PaddlePaddle. [#73271](https://github.com/PaddlePaddle/Paddle/pull/73271)
+- Fixed issues with `moe_permute` and `moe_unpermute`. [#73365](https://github.com/PaddlePaddle/Paddle/pull/73365)
+- Fixed the syntax checking issue of `ast.parse` for pyi files. [#71872](https://github.com/PaddlePaddle/Paddle/pull/71872)
+- Fixed the issue of complex division. [#73331](https://github.com/PaddlePaddle/Paddle/pull/73331)
+- Fixed issues related to TensorRT integration. [#72302](https://github.com/PaddlePaddle/Paddle/pull/72302), [#72278](https://github.com/PaddlePaddle/Paddle/pull/72278)
-### Document optimization
+### Improvements
-- Enhanced several API documents to make them easier to read and understand. [#67772](https://github.com/PaddlePaddle/Paddle/pull/67772), [#69895](https://github.com/PaddlePaddle/Paddle/pull/69895), [#65904](https://github.com/PaddlePaddle/Paddle/pull/65904), [#66480](https://github.com/PaddlePaddle/Paddle/pull/66480), [#66974](https://github.com/PaddlePaddle/Paddle/pull/66974), [#67100](https://github.com/PaddlePaddle/Paddle/pull/67100), [#66991](https://github.com/PaddlePaddle/Paddle/pull/66991), [#67287](https://github.com/PaddlePaddle/Paddle/pull/67287), [#67841](https://github.com/PaddlePaddle/Paddle/pull/67841), [#68206](https://github.com/PaddlePaddle/Paddle/pull/68206), [#68305](https://github.com/PaddlePaddle/Paddle/pull/68305), [#68462](https://github.com/PaddlePaddle/Paddle/pull/68462), [#67061](https://github.com/PaddlePaddle/Paddle/pull/67061), [#66503](https://github.com/PaddlePaddle/Paddle/pull/66503), [#68856](https://github.com/PaddlePaddle/Paddle/pull/68856), [#68866](https://github.com/PaddlePaddle/Paddle/pull/68866), [#68768](https://github.com/PaddlePaddle/Paddle/pull/68768), [#69215](https://github.com/PaddlePaddle/Paddle/pull/69215), [#69449](https://github.com/PaddlePaddle/Paddle/pull/69449), [#69396](https://github.com/PaddlePaddle/Paddle/pull/69396), [#69498](https://github.com/PaddlePaddle/Paddle/pull/69498), [#69413](https://github.com/PaddlePaddle/Paddle/pull/69413), [#69404](https://github.com/PaddlePaddle/Paddle/pull/69404), [#69729](https://github.com/PaddlePaddle/Paddle/pull/69729), [#69749](https://github.com/PaddlePaddle/Paddle/pull/69749), [#69266](https://github.com/PaddlePaddle/Paddle/pull/69266), [#69989](https://github.com/PaddlePaddle/Paddle/pull/69989), [#70209](https://github.com/PaddlePaddle/Paddle/pull/70209), [#70128](https://github.com/PaddlePaddle/Paddle/pull/70128), [#70143](https://github.com/PaddlePaddle/Paddle/pull/70143), [#69874](https://github.com/PaddlePaddle/Paddle/pull/69874), [#70242](https://github.com/PaddlePaddle/Paddle/pull/70242), [#70145](https://github.com/PaddlePaddle/Paddle/pull/70145), [#70813](https://github.com/PaddlePaddle/Paddle/pull/70813), [#71046](https://github.com/PaddlePaddle/Paddle/pull/71046)
+- Enhance the functionality of the API, improve its usability, and enhance the user experience. This includes but is not limited to expanding the data types supported by the API, checking API parameters, correcting default values of API parameters, and refining API return values. [#71997](https://github.com/PaddlePaddle/Paddle/pull/71997), [#72911](https://github.com/PaddlePaddle/Paddle/pull/72911), [#72985](https://github.com/PaddlePaddle/Paddle/pull/72985), [#73240](https://github.com/PaddlePaddle/Paddle/pull/73240), [#72927](https://github.com/PaddlePaddle/Paddle/pull/72927), [#73451](https://github.com/PaddlePaddle/Paddle/pull/73451), [#73416](https://github.com/PaddlePaddle/Paddle/pull/73416), [#73420](https://github.com/PaddlePaddle/Paddle/pull/73420), [#73347](https://github.com/PaddlePaddle/Paddle/pull/73347), [#73050](https://github.com/PaddlePaddle/Paddle/pull/73050), [#73246](https://github.com/PaddlePaddle/Paddle/pull/73246), [#73123](https://github.com/PaddlePaddle/Paddle/pull/73123), [#73336](https://github.com/PaddlePaddle/Paddle/pull/73336), [#73062](https://github.com/PaddlePaddle/Paddle/pull/73062), [#72201](https://github.com/PaddlePaddle/Paddle/pull/72201), [#72190](https://github.com/PaddlePaddle/Paddle/pull/72190)
+- Enhanced API support for complex types. [#72279](https://github.com/PaddlePaddle/Paddle/pull/72279), [#72308](https://github.com/PaddlePaddle/Paddle/pull/72308), [#72518](https://github.com/PaddlePaddle/Paddle/pull/72518), [#72391](https://github.com/PaddlePaddle/Paddle/pull/72391), [#72239](https://github.com/PaddlePaddle/Paddle/pull/72239), [#72286](https://github.com/PaddlePaddle/Paddle/pull/72286), [#72169](https://github.com/PaddlePaddle/Paddle/pull/72169), [#72577](https://github.com/PaddlePaddle/Paddle/pull/72577), [#72619](https://github.com/PaddlePaddle/Paddle/pull/72619)
+- Enhanced API support for 0-Size Tensor. [#72570](https://github.com/PaddlePaddle/Paddle/pull/72570), [#72692](https://github.com/PaddlePaddle/Paddle/pull/72692), [#72138](https://github.com/PaddlePaddle/Paddle/pull/72138), [#72410](https://github.com/PaddlePaddle/Paddle/pull/72410), [#72565](https://github.com/PaddlePaddle/Paddle/pull/72565), [#72262](https://github.com/PaddlePaddle/Paddle/pull/72262)
+- Correct spelling errors in the API code to enhance overall accuracy and professionalism. [#71780](https://github.com/PaddlePaddle/Paddle/pull/71780), [#71786](https://github.com/PaddlePaddle/Paddle/pull/71786), [#72093](https://github.com/PaddlePaddle/Paddle/pull/72093), [#72113](https://github.com/PaddlePaddle/Paddle/pull/72113), [#72241](https://github.com/PaddlePaddle/Paddle/pull/72241), [#72237](https://github.com/PaddlePaddle/Paddle/pull/72237), [#72590](https://github.com/PaddlePaddle/Paddle/pull/72590), [#72591](https://github.com/PaddlePaddle/Paddle/pull/72591), [#72769](https://github.com/PaddlePaddle/Paddle/pull/72769), [#72858](https://github.com/PaddlePaddle/Paddle/pull/72858), [#73045](https://github.com/PaddlePaddle/Paddle/pull/73045), [#72195](https://github.com/PaddlePaddle/Paddle/pull/72195), [#72627](https://github.com/PaddlePaddle/Paddle/pull/72627), [#72657](https://github.com/PaddlePaddle/Paddle/pull/72657), [#73162](https://github.com/PaddlePaddle/Paddle/pull/73162), [#73402](https://github.com/PaddlePaddle/Paddle/pull/73402), [#72208](https://github.com/PaddlePaddle/Paddle/pull/72208), [#72659](https://github.com/PaddlePaddle/Paddle/pull/72659), [#72658](https://github.com/PaddlePaddle/Paddle/pull/72658), [#72660](https://github.com/PaddlePaddle/Paddle/pull/72660), [#72661](https://github.com/PaddlePaddle/Paddle/pull/72661), [#72656](https://github.com/PaddlePaddle/Paddle/pull/72656)
+- Communication optimization reduces peak memory usage. [#72035](https://github.com/PaddlePaddle/Paddle/pull/72035)
-## 2. Basic execution architecture
+### Docs
-PIR is fully implemented and enabled by default, supporting one-click transition from motion to stillness, ensuring excellent performance and good scalability of the framework.
+- Fixed errors in the documentation, improving its usability and user experience. [#72549](https://github.com/PaddlePaddle/Paddle/pull/72549), [#73036](https://github.com/PaddlePaddle/Paddle/pull/73036)
-### Bug Fixes
-
-- Fixed accuracy issues caused by parameter configuration. [#65814](https://github.com/PaddlePaddle/Paddle/pull/65814)
-- Fixed bugs related to save/load. [#65268](https://github.com/PaddlePaddle/Paddle/pull/65268), [#65359](https://github.com/PaddlePaddle/Paddle/pull/65359), [#65373](https://github.com/PaddlePaddle/Paddle/pull/65373), [#65314](https://github.com/PaddlePaddle/Paddle/pull/65314), [#65446](https://github.com/PaddlePaddle/Paddle/pull/65446), [#65476](https://github.com/PaddlePaddle/Paddle/pull/65476), [#66891](https://github.com/PaddlePaddle/Paddle/pull/66891), [#66931](https://github.com/PaddlePaddle/Paddle/pull/66931), [#65978](https://github.com/PaddlePaddle/Paddle/pull/65978), [#67654](https://github.com/PaddlePaddle/Paddle/pull/67654), [#67906](https://github.com/PaddlePaddle/Paddle/pull/67906), [#68723](https://github.com/PaddlePaddle/Paddle/pull/68723), [#71452](https://github.com/PaddlePaddle/Paddle/pull/71452), [#71457](https://github.com/PaddlePaddle/Paddle/pull/71457), [#67819](https://github.com/PaddlePaddle/Paddle/pull/67819), [#68120](https://github.com/PaddlePaddle/Paddle/pull/68120), [#68300](https://github.com/PaddlePaddle/Paddle/pull/68300), [#68315](https://github.com/PaddlePaddle/Paddle/pull/68315), [#68743](https://github.com/PaddlePaddle/Paddle/pull/68743), [#68744](https://github.com/PaddlePaddle/Paddle/pull/68744), [#69585](https://github.com/PaddlePaddle/Paddle/pull/69585), [#71165](https://github.com/PaddlePaddle/Paddle/pull/71165), [#71400](https://github.com/PaddlePaddle/Paddle/pull/71400)
-- Skip/fix failed unit tests in PIR mode, including scenarios such as Windows and XPU. [#65690](https://github.com/PaddlePaddle/Paddle/pull/65690), [#65759](https://github.com/PaddlePaddle/Paddle/pull/65759), [#65730](https://github.com/PaddlePaddle/Paddle/pull/65730), [#65760](https://github.com/PaddlePaddle/Paddle/pull/65760), [#65833](https://github.com/PaddlePaddle/Paddle/pull/65833), [#65834](https://github.com/PaddlePaddle/Paddle/pull/65834), [#65856](https://github.com/PaddlePaddle/Paddle/pull/65856), [#65886](https://github.com/PaddlePaddle/Paddle/pull/65886), [#65899](https://github.com/PaddlePaddle/Paddle/pull/65899), [#65932](https://github.com/PaddlePaddle/Paddle/pull/65932), [#65998](https://github.com/PaddlePaddle/Paddle/pull/65998), [#65953](https://github.com/PaddlePaddle/Paddle/pull/65953), [#65997](https://github.com/PaddlePaddle/Paddle/pull/65997), [#66061](https://github.com/PaddlePaddle/Paddle/pull/66061), [#66111](https://github.com/PaddlePaddle/Paddle/pull/66111), [#66137](https://github.com/PaddlePaddle/Paddle/pull/66137), [#66073](https://github.com/PaddlePaddle/Paddle/pull/66073), [#66203](https://github.com/PaddlePaddle/Paddle/pull/66203), [#66227](https://github.com/PaddlePaddle/Paddle/pull/66227), [#65744](https://github.com/PaddlePaddle/Paddle/pull/65744), [#66234](https://github.com/PaddlePaddle/Paddle/pull/66234), [#67487](https://github.com/PaddlePaddle/Paddle/pull/67487), [#67561](https://github.com/PaddlePaddle/Paddle/pull/67561), [#67584](https://github.com/PaddlePaddle/Paddle/pull/67584), [#67742](https://github.com/PaddlePaddle/Paddle/pull/67742), [#69832](https://github.com/PaddlePaddle/Paddle/pull/69832), [#65885](https://github.com/PaddlePaddle/Paddle/pull/65885), [#66709](https://github.com/PaddlePaddle/Paddle/pull/66709), [#66734](https://github.com/PaddlePaddle/Paddle/pull/66734), [#66959](https://github.com/PaddlePaddle/Paddle/pull/66959), [#67399](https://github.com/PaddlePaddle/Paddle/pull/67399), [#67389](https://github.com/PaddlePaddle/Paddle/pull/67389), [#67230](https://github.com/PaddlePaddle/Paddle/pull/67230), [#67403](https://github.com/PaddlePaddle/Paddle/pull/67403), [#67619](https://github.com/PaddlePaddle/Paddle/pull/67619), [#67662](https://github.com/PaddlePaddle/Paddle/pull/67662), [#67902](https://github.com/PaddlePaddle/Paddle/pull/67902), [#67382](https://github.com/PaddlePaddle/Paddle/pull/67382), [#67430](https://github.com/PaddlePaddle/Paddle/pull/67430), [#67517](https://github.com/PaddlePaddle/Paddle/pull/67517), [#67533](https://github.com/PaddlePaddle/Paddle/pull/67533), [#67573](https://github.com/PaddlePaddle/Paddle/pull/67573), [#67468](https://github.com/PaddlePaddle/Paddle/pull/67468), [#67640](https://github.com/PaddlePaddle/Paddle/pull/67640), [#67667](https://github.com/PaddlePaddle/Paddle/pull/67667), [#67716](https://github.com/PaddlePaddle/Paddle/pull/67716), [#68386](https://github.com/PaddlePaddle/Paddle/pull/68386), [#67234](https://github.com/PaddlePaddle/Paddle/pull/67234), [#67266](https://github.com/PaddlePaddle/Paddle/pull/67266), [#67362](https://github.com/PaddlePaddle/Paddle/pull/67362), [#67631](https://github.com/PaddlePaddle/Paddle/pull/67631), [#68081](https://github.com/PaddlePaddle/Paddle/pull/68081)
-- Fixed bugs related to dynamic graphs. [#65619](https://github.com/PaddlePaddle/Paddle/pull/65619), [#69163](https://github.com/PaddlePaddle/Paddle/pull/69163), [#68862](https://github.com/PaddlePaddle/Paddle/pull/68862), [#68164](https://github.com/PaddlePaddle/Paddle/pull/68164), [#69867](https://github.com/PaddlePaddle/Paddle/pull/69867)
-- Fixed bugs related to control flow. [#65722](https://github.com/PaddlePaddle/Paddle/pull/65722), [#70181](https://github.com/PaddlePaddle/Paddle/pull/70181)
-- Fixed kernel operation-related bugs, including issues with operation positions and null pointers. [#66334](https://github.com/PaddlePaddle/Paddle/pull/66334), [#67931](https://github.com/PaddlePaddle/Paddle/pull/67931), [#70353](https://github.com/PaddlePaddle/Paddle/pull/70353)
-- Fixed Amp-related bugs. [#66778](https://github.com/PaddlePaddle/Paddle/pull/66778), [#67582](https://github.com/PaddlePaddle/Paddle/pull/67582), [#67704](https://github.com/PaddlePaddle/Paddle/pull/67704), [#68655](https://github.com/PaddlePaddle/Paddle/pull/68655)
-- Fixed CINN-related bugs. [#69577](https://github.com/PaddlePaddle/Paddle/pull/69577), [#71101](https://github.com/PaddlePaddle/Paddle/pull/71101), [#71387](https://github.com/PaddlePaddle/Paddle/pull/71387), [#71401](https://github.com/PaddlePaddle/Paddle/pull/71401)
-- Fixed the bug related to the transition from dynamic to static. [#67617](https://github.com/PaddlePaddle/Paddle/pull/67617), [#67936](https://github.com/PaddlePaddle/Paddle/pull/67936), [#68938](https://github.com/PaddlePaddle/Paddle/pull/68938), [#68734](https://github.com/PaddlePaddle/Paddle/pull/68734), [#69010](https://github.com/PaddlePaddle/Paddle/pull/69010), [#69408](https://github.com/PaddlePaddle/Paddle/pull/69408), [#69461](https://github.com/PaddlePaddle/Paddle/pull/69461), [#69699](https://github.com/PaddlePaddle/Paddle/pull/69699), [#69774](https://github.com/PaddlePaddle/Paddle/pull/69774), [#69803](https://github.com/PaddlePaddle/Paddle/pull/69803), [#69853](https://github.com/PaddlePaddle/Paddle/pull/69853), [#70510](https://github.com/PaddlePaddle/Paddle/pull/70510), [#70830](https://github.com/PaddlePaddle/Paddle/pull/70830), [#70904](https://github.com/PaddlePaddle/Paddle/pull/70904), [#70913](https://github.com/PaddlePaddle/Paddle/pull/70913), [#71040](https://github.com/PaddlePaddle/Paddle/pull/71040), [#71048](https://github.com/PaddlePaddle/Paddle/pull/71048), [#71106](https://github.com/PaddlePaddle/Paddle/pull/71106), [#71201](https://github.com/PaddlePaddle/Paddle/pull/71201), [#71216](https://github.com/PaddlePaddle/Paddle/pull/71216), [#71223](https://github.com/PaddlePaddle/Paddle/pull/71223), [#71296](https://github.com/PaddlePaddle/Paddle/pull/71296), [#71385](https://github.com/PaddlePaddle/Paddle/pull/71385), [#71505](https://github.com/PaddlePaddle/Paddle/pull/71505), [#66934](https://github.com/PaddlePaddle/Paddle/pull/66934), [#71096](https://github.com/PaddlePaddle/Paddle/pull/71096), [#71144](https://github.com/PaddlePaddle/Paddle/pull/71144), [#71430](https://github.com/PaddlePaddle/Paddle/pull/71430), [#71437](https://github.com/PaddlePaddle/Paddle/pull/71437), [#71473](https://github.com/PaddlePaddle/Paddle/pull/71473), [#71412](https://github.com/PaddlePaddle/Paddle/pull/71412), [#65648](https://github.com/PaddlePaddle/Paddle/pull/65648), [#67853](https://github.com/PaddlePaddle/Paddle/pull/67853), [#66543](https://github.com/PaddlePaddle/Paddle/pull/66543), [#68229](https://github.com/PaddlePaddle/Paddle/pull/68229), [#70846](https://github.com/PaddlePaddle/Paddle/pull/70846), [#67532](https://github.com/PaddlePaddle/Paddle/pull/67532)
-- Fixed other bugs, including issues related to backpropagation gradient calculation, memory copying, and executor errors. [#65493](https://github.com/PaddlePaddle/Paddle/pull/65493), [#65678](https://github.com/PaddlePaddle/Paddle/pull/65678), [#65673](https://github.com/PaddlePaddle/Paddle/pull/65673), [#65794](https://github.com/PaddlePaddle/Paddle/pull/65794), [#66358](https://github.com/PaddlePaddle/Paddle/pull/66358), [#66875](https://github.com/PaddlePaddle/Paddle/pull/66875), [#67339](https://github.com/PaddlePaddle/Paddle/pull/67339), [#67465](https://github.com/PaddlePaddle/Paddle/pull/67465), [#67754](https://github.com/PaddlePaddle/Paddle/pull/67754), [#67835](https://github.com/PaddlePaddle/Paddle/pull/67835), [#67892](https://github.com/PaddlePaddle/Paddle/pull/67892), [#67967](https://github.com/PaddlePaddle/Paddle/pull/67967), [#67952](https://github.com/PaddlePaddle/Paddle/pull/67952), [#68036](https://github.com/PaddlePaddle/Paddle/pull/68036), [#68063](https://github.com/PaddlePaddle/Paddle/pull/68063), [#68128](https://github.com/PaddlePaddle/Paddle/pull/68128), [#68151](https://github.com/PaddlePaddle/Paddle/pull/68151), [#68140](https://github.com/PaddlePaddle/Paddle/pull/68140), [#68167](https://github.com/PaddlePaddle/Paddle/pull/68167), [#68200](https://github.com/PaddlePaddle/Paddle/pull/68200), [#68325](https://github.com/PaddlePaddle/Paddle/pull/68325), [#68376](https://github.com/PaddlePaddle/Paddle/pull/68376), [#68539](https://github.com/PaddlePaddle/Paddle/pull/68539), [#68530](https://github.com/PaddlePaddle/Paddle/pull/68530), [#68637](https://github.com/PaddlePaddle/Paddle/pull/68637), [#68639](https://github.com/PaddlePaddle/Paddle/pull/68639), [#68688](https://github.com/PaddlePaddle/Paddle/pull/68688), [#68751](https://github.com/PaddlePaddle/Paddle/pull/68751), [#68806](https://github.com/PaddlePaddle/Paddle/pull/68806), [#68810](https://github.com/PaddlePaddle/Paddle/pull/68810), [#68779](https://github.com/PaddlePaddle/Paddle/pull/68779), [#68811](https://github.com/PaddlePaddle/Paddle/pull/68811), [#68844](https://github.com/PaddlePaddle/Paddle/pull/68844), [#68790](https://github.com/PaddlePaddle/Paddle/pull/68790), [#68870](https://github.com/PaddlePaddle/Paddle/pull/68870), [#68960](https://github.com/PaddlePaddle/Paddle/pull/68960), [#68999](https://github.com/PaddlePaddle/Paddle/pull/68999), [#69036](https://github.com/PaddlePaddle/Paddle/pull/69036), [#69188](https://github.com/PaddlePaddle/Paddle/pull/69188), [#69234](https://github.com/PaddlePaddle/Paddle/pull/69234), [#69375](https://github.com/PaddlePaddle/Paddle/pull/69375), [#69399](https://github.com/PaddlePaddle/Paddle/pull/69399), [#69538](https://github.com/PaddlePaddle/Paddle/pull/69538), [#69603](https://github.com/PaddlePaddle/Paddle/pull/69603), [#69633](https://github.com/PaddlePaddle/Paddle/pull/69633), [#69765](https://github.com/PaddlePaddle/Paddle/pull/69765), [#69768](https://github.com/PaddlePaddle/Paddle/pull/69768), [#69821](https://github.com/PaddlePaddle/Paddle/pull/69821), [#70091](https://github.com/PaddlePaddle/Paddle/pull/70091), [#70123](https://github.com/PaddlePaddle/Paddle/pull/70123), [#70147](https://github.com/PaddlePaddle/Paddle/pull/70147), [#70201](https://github.com/PaddlePaddle/Paddle/pull/70201), [#70198](https://github.com/PaddlePaddle/Paddle/pull/70198), [#69815](https://github.com/PaddlePaddle/Paddle/pull/69815), [#70420](https://github.com/PaddlePaddle/Paddle/pull/70420), [#70377](https://github.com/PaddlePaddle/Paddle/pull/70377), [#70552](https://github.com/PaddlePaddle/Paddle/pull/70552), [#70545](https://github.com/PaddlePaddle/Paddle/pull/70545), [#70595](https://github.com/PaddlePaddle/Paddle/pull/70595), [#70836](https://github.com/PaddlePaddle/Paddle/pull/70836), [#70771](https://github.com/PaddlePaddle/Paddle/pull/70771), [#70922](https://github.com/PaddlePaddle/Paddle/pull/70922), [#70969](https://github.com/PaddlePaddle/Paddle/pull/70969), [#70926](https://github.com/PaddlePaddle/Paddle/pull/70926), [#71117](https://github.com/PaddlePaddle/Paddle/pull/71117), [#71151](https://github.com/PaddlePaddle/Paddle/pull/71151), [#71194](https://github.com/PaddlePaddle/Paddle/pull/71194), [#71234](https://github.com/PaddlePaddle/Paddle/pull/71234), [#71339](https://github.com/PaddlePaddle/Paddle/pull/71339), [#71445](https://github.com/PaddlePaddle/Paddle/pull/71445), [#66350](https://github.com/PaddlePaddle/Paddle/pull/66350), [#66533](https://github.com/PaddlePaddle/Paddle/pull/66533), [#66622](https://github.com/PaddlePaddle/Paddle/pull/66622), [#67721](https://github.com/PaddlePaddle/Paddle/pull/67721), [#67700](https://github.com/PaddlePaddle/Paddle/pull/67700), [#69207](https://github.com/PaddlePaddle/Paddle/pull/69207), [#69615](https://github.com/PaddlePaddle/Paddle/pull/69615), [#69785](https://github.com/PaddlePaddle/Paddle/pull/69785), [#67805](https://github.com/PaddlePaddle/Paddle/pull/67805)
-
-### Function optimization
-
-- Support save/load. [#65296](https://github.com/PaddlePaddle/Paddle/pull/65296), [#65671](https://github.com/PaddlePaddle/Paddle/pull/65671), [#66231](https://github.com/PaddlePaddle/Paddle/pull/66231), [#66185](https://github.com/PaddlePaddle/Paddle/pull/66185), [#66722](https://github.com/PaddlePaddle/Paddle/pull/66722), [#66863](https://github.com/PaddlePaddle/Paddle/pull/66863), [#67057](https://github.com/PaddlePaddle/Paddle/pull/67057), [#68101](https://github.com/PaddlePaddle/Paddle/pull/68101), [#68628](https://github.com/PaddlePaddle/Paddle/pull/68628), [#66359](https://github.com/PaddlePaddle/Paddle/pull/66359), [#68481](https://github.com/PaddlePaddle/Paddle/pull/68481)
-- Optimize the compilation process of custom operators. [#67615](https://github.com/PaddlePaddle/Paddle/pull/67615), [#67659](https://github.com/PaddlePaddle/Paddle/pull/67659)
-- Support for composite operators. [#69121](https://github.com/PaddlePaddle/Paddle/pull/69121), [#69144](https://github.com/PaddlePaddle/Paddle/pull/69144), [#70204](https://github.com/PaddlePaddle/Paddle/pull/70204), [#71098](https://github.com/PaddlePaddle/Paddle/pull/71098), [#71335](https://github.com/PaddlePaddle/Paddle/pull/71335)
-- Support for CINN compiler execution. [#69589](https://github.com/PaddlePaddle/Paddle/pull/69589), [#70115](https://github.com/PaddlePaddle/Paddle/pull/70115)
-- Support for custom devices. [#70909](https://github.com/PaddlePaddle/Paddle/pull/70909), [#71294](https://github.com/PaddlePaddle/Paddle/pull/71294), [#71362](https://github.com/PaddlePaddle/Paddle/pull/71362), [#71010](https://github.com/PaddlePaddle/Paddle/pull/71010), [#71036](https://github.com/PaddlePaddle/Paddle/pull/71036), [#70637](https://github.com/PaddlePaddle/Paddle/pull/70637), [#71085](https://github.com/PaddlePaddle/Paddle/pull/71085)
-- Execution support for other scenarios. [#65050](https://github.com/PaddlePaddle/Paddle/pull/65050), [#65664](https://github.com/PaddlePaddle/Paddle/pull/65664), [#65741](https://github.com/PaddlePaddle/Paddle/pull/65741), [#65786](https://github.com/PaddlePaddle/Paddle/pull/65786), [#65499](https://github.com/PaddlePaddle/Paddle/pull/65499), [#66441](https://github.com/PaddlePaddle/Paddle/pull/66441), [#67668](https://github.com/PaddlePaddle/Paddle/pull/67668), [#68199](https://github.com/PaddlePaddle/Paddle/pull/68199), [#69088](https://github.com/PaddlePaddle/Paddle/pull/69088), [#70199](https://github.com/PaddlePaddle/Paddle/pull/70199), [#70308](https://github.com/PaddlePaddle/Paddle/pull/70308), [#70709](https://github.com/PaddlePaddle/Paddle/pull/70709), [#70937](https://github.com/PaddlePaddle/Paddle/pull/70937), [#71066](https://github.com/PaddlePaddle/Paddle/pull/71066), [#71079](https://github.com/PaddlePaddle/Paddle/pull/71079), [#71121](https://github.com/PaddlePaddle/Paddle/pull/71121), [#71136](https://github.com/PaddlePaddle/Paddle/pull/71136), [#71205](https://github.com/PaddlePaddle/Paddle/pull/71205)
-
-### New Features
+### Devs
-- SOT adapts to Python 3.13 bytecode, supporting static graph conversion (SOT mode) under Python 3.13. [#68071](https://github.com/PaddlePaddle/Paddle/pull/68071), [#69126](https://github.com/PaddlePaddle/Paddle/pull/69126), [#69131](https://github.com/PaddlePaddle/Paddle/pull/69131), [#69196](https://github.com/PaddlePaddle/Paddle/pull/69196), [#69232](https://github.com/PaddlePaddle/Paddle/pull/69232), [#69253](https://github.com/PaddlePaddle/Paddle/pull/69253), [#69267](https://github.com/PaddlePaddle/Paddle/pull/69267), [#69412](https://github.com/PaddlePaddle/Paddle/pull/69412), [#69431](https://github.com/PaddlePaddle/Paddle/pull/69431), [#69432](https://github.com/PaddlePaddle/Paddle/pull/69432), [#69436](https://github.com/PaddlePaddle/Paddle/pull/69436), [#69557](https://github.com/PaddlePaddle/Paddle/pull/69557), [#69567](https://github.com/PaddlePaddle/Paddle/pull/69567), [#69700](https://github.com/PaddlePaddle/Paddle/pull/69700), [#69707](https://github.com/PaddlePaddle/Paddle/pull/69707), [#69735](https://github.com/PaddlePaddle/Paddle/pull/69735), [#69738](https://github.com/PaddlePaddle/Paddle/pull/69738), [#69744](https://github.com/PaddlePaddle/Paddle/pull/69744), [#69753](https://github.com/PaddlePaddle/Paddle/pull/69753), [#69887](https://github.com/PaddlePaddle/Paddle/pull/69887), [#69920](https://github.com/PaddlePaddle/Paddle/pull/69920), [#69950](https://github.com/PaddlePaddle/Paddle/pull/69950), [#70319](https://github.com/PaddlePaddle/Paddle/pull/70319), [#70927](https://github.com/PaddlePaddle/Paddle/pull/70927)
-- Support for custom devices. [#68061](https://github.com/PaddlePaddle/Paddle/pull/68061), [#68836](https://github.com/PaddlePaddle/Paddle/pull/68836), [#70366](https://github.com/PaddlePaddle/Paddle/pull/70366), [#70549](https://github.com/PaddlePaddle/Paddle/pull/70549)
-- Adapted PIR forward execution. [#65335](https://github.com/PaddlePaddle/Paddle/pull/65335)
-- Support save/load. [#67910](https://github.com/PaddlePaddle/Paddle/pull/67910)
-- Adapted to pylayer. [#70335](https://github.com/PaddlePaddle/Paddle/pull/70335)
-- Adapt lazy_init. [#67379](https://github.com/PaddlePaddle/Paddle/pull/67379), [#67467](https://github.com/PaddlePaddle/Paddle/pull/67467)
-- Optimize the logic under PIR. [#67961](https://github.com/PaddlePaddle/Paddle/pull/67961)
-- Support for other scenarios. [#68344](https://github.com/PaddlePaddle/Paddle/pull/68344), [#70071](https://github.com/PaddlePaddle/Paddle/pull/70071), [#70291](https://github.com/PaddlePaddle/Paddle/pull/70291), [#70752](https://github.com/PaddlePaddle/Paddle/pull/70752), [#70812](https://github.com/PaddlePaddle/Paddle/pull/70812), [#71033](https://github.com/PaddlePaddle/Paddle/pull/71033)
+- Updates to code style check rules. [#72896](https://github.com/PaddlePaddle/Paddle/pull/72896), [#73179](https://github.com/PaddlePaddle/Paddle/pull/73179), [#73060](https://github.com/PaddlePaddle/Paddle/pull/73060), [#72553](https://github.com/PaddlePaddle/Paddle/pull/72553), [#72915](https://github.com/PaddlePaddle/Paddle/pull/72915), [#72916](https://github.com/PaddlePaddle/Paddle/pull/72916), [#73338](https://github.com/PaddlePaddle/Paddle/pull/73338), [#72935](https://github.com/PaddlePaddle/Paddle/pull/72935), [#72325](https://github.com/PaddlePaddle/Paddle/pull/72325), [#72935](https://github.com/PaddlePaddle/Paddle/pull/72935)
+- Code variable naming updates and code migration. [#73048](https://github.com/PaddlePaddle/Paddle/pull/73048), [#73148](https://github.com/PaddlePaddle/Paddle/pull/73148), [#73149](https://github.com/PaddlePaddle/Paddle/pull/73149), [#73264](https://github.com/PaddlePaddle/Paddle/pull/73264), [#73159](https://github.com/PaddlePaddle/Paddle/pull/73159), [#73124](https://github.com/PaddlePaddle/Paddle/pull/73124), [#73160](https://github.com/PaddlePaddle/Paddle/pull/73160), [#73161](https://github.com/PaddlePaddle/Paddle/pull/73161), [#73374](https://github.com/PaddlePaddle/Paddle/pull/73374), [#73395](https://github.com/PaddlePaddle/Paddle/pull/73395), [#73076](https://github.com/PaddlePaddle/Paddle/pull/73076), [#73163](https://github.com/PaddlePaddle/Paddle/pull/73163), [#73255](https://github.com/PaddlePaddle/Paddle/pull/73255)
+- LodTensor is being phased out. [#71968](https://github.com/PaddlePaddle/Paddle/pull/71968), [#72152](https://github.com/PaddlePaddle/Paddle/pull/72152), [#72145](https://github.com/PaddlePaddle/Paddle/pull/72145)
-### Changes unrelated to ordinary users
+### Deprecations
-- Optimize SOT debugging for experience and improve development efficiency. [#67560](https://github.com/PaddlePaddle/Paddle/pull/67560), [#69072](https://github.com/PaddlePaddle/Paddle/pull/69072), [#69837](https://github.com/PaddlePaddle/Paddle/pull/69837), [#70134](https://github.com/PaddlePaddle/Paddle/pull/70134), [#70387](https://github.com/PaddlePaddle/Paddle/pull/70387), [#70740](https://github.com/PaddlePaddle/Paddle/pull/70740), [#71118](https://github.com/PaddlePaddle/Paddle/pull/71118), [#71268](https://github.com/PaddlePaddle/Paddle/pull/71268), [#71275](https://github.com/PaddlePaddle/Paddle/pull/71275), [#71458](https://github.com/PaddlePaddle/Paddle/pull/71458), [#71460](https://github.com/PaddlePaddle/Paddle/pull/71460)
-- Other changes unrelated to user usage. [#65393](https://github.com/PaddlePaddle/Paddle/pull/65393), [#65795](https://github.com/PaddlePaddle/Paddle/pull/65795), [#65799](https://github.com/PaddlePaddle/Paddle/pull/65799), [#65911](https://github.com/PaddlePaddle/Paddle/pull/65911), [#65977](https://github.com/PaddlePaddle/Paddle/pull/65977), [#66982](https://github.com/PaddlePaddle/Paddle/pull/66982), [#67563](https://github.com/PaddlePaddle/Paddle/pull/67563), [#68761](https://github.com/PaddlePaddle/Paddle/pull/68761), [#68909](https://github.com/PaddlePaddle/Paddle/pull/68909), [#69130](https://github.com/PaddlePaddle/Paddle/pull/69130), [#69233](https://github.com/PaddlePaddle/Paddle/pull/69233), [#69956](https://github.com/PaddlePaddle/Paddle/pull/69956), [#71142](https://github.com/PaddlePaddle/Paddle/pull/71142)
+- Cleaned up useless code. [#71795](https://github.com/PaddlePaddle/Paddle/pull/71795), [#71792](https://github.com/PaddlePaddle/Paddle/pull/71792), [#71794](https://github.com/PaddlePaddle/Paddle/pull/71794), [#71793](https://github.com/PaddlePaddle/Paddle/pull/71793), [#72265](https://github.com/PaddlePaddle/Paddle/pull/72265), [#73167](https://github.com/PaddlePaddle/Paddle/pull/73167), [#73115](https://github.com/PaddlePaddle/Paddle/pull/73115), [#73049](https://github.com/PaddlePaddle/Paddle/pull/73049), [#72162](https://github.com/PaddlePaddle/Paddle/pull/72162), [#72321](https://github.com/PaddlePaddle/Paddle/pull/72321), [#72336](https://github.com/PaddlePaddle/Paddle/pull/72336), [#72952](https://github.com/PaddlePaddle/Paddle/pull/72952), [#72828](https://github.com/PaddlePaddle/Paddle/pull/72828)
-### Security Issues
-
-- Introduced approval rules for IR (Intermediate Representation) save/load operations to enhance security and governance during model serialization. [#65737](https://github.com/PaddlePaddle/Paddle/pull/65737)
-
-### Others
+## 2. Execution architecture
-- Sparse API migration. [#66139](https://github.com/PaddlePaddle/Paddle/pull/66139), [#66319](https://github.com/PaddlePaddle/Paddle/pull/66319), [#66866](https://github.com/PaddlePaddle/Paddle/pull/66866)
-- PIR function extension. [#67966](https://github.com/PaddlePaddle/Paddle/pull/67966), [#69909](https://github.com/PaddlePaddle/Paddle/pull/69909)
-- Migrate file locations. [#66477](https://github.com/PaddlePaddle/Paddle/pull/66477), [#66824](https://github.com/PaddlePaddle/Paddle/pull/66824), [#67592](https://github.com/PaddlePaddle/Paddle/pull/67592)
-- Log addition. [#68382](https://github.com/PaddlePaddle/Paddle/pull/68382), [#70506](https://github.com/PaddlePaddle/Paddle/pull/70506)
-- Enable PIR by default. [#68278](https://github.com/PaddlePaddle/Paddle/pull/68278)
-- Header file organization. [#68422](https://github.com/PaddlePaddle/Paddle/pull/68422), [#68471](https://github.com/PaddlePaddle/Paddle/pull/68471)
-- Compilation optimization. [#67831](https://github.com/PaddlePaddle/Paddle/pull/67831), [#67821](https://github.com/PaddlePaddle/Paddle/pull/67821), [#68717](https://github.com/PaddlePaddle/Paddle/pull/68717)
-- Manage related tests with guards. [#67816](https://github.com/PaddlePaddle/Paddle/pull/67816), [#67827](https://github.com/PaddlePaddle/Paddle/pull/67827), [#67989](https://github.com/PaddlePaddle/Paddle/pull/67989)
-- Fixed spelling errors. [#70784](https://github.com/PaddlePaddle/Paddle/pull/70784), [#70787](https://github.com/PaddlePaddle/Paddle/pull/70787)
-- Check for CUDA errors. [#70399](https://github.com/PaddlePaddle/Paddle/pull/70399)
-
-### Developer
-
-- Fix issues in dynamic-to-static conversion. Improve overall graph conversion success rate and optimize inference export experience. [#65291](https://github.com/PaddlePaddle/Paddle/pull/65291), [#66153](https://github.com/PaddlePaddle/Paddle/pull/66153), [#66379](https://github.com/PaddlePaddle/Paddle/pull/66379), [#66557](https://github.com/PaddlePaddle/Paddle/pull/66557), [#67021](https://github.com/PaddlePaddle/Paddle/pull/67021), [#67482](https://github.com/PaddlePaddle/Paddle/pull/67482), [#67495](https://github.com/PaddlePaddle/Paddle/pull/67495), [#67981](https://github.com/PaddlePaddle/Paddle/pull/67981), [#68030](https://github.com/PaddlePaddle/Paddle/pull/68030), [#68078](https://github.com/PaddlePaddle/Paddle/pull/68078), [#68328](https://github.com/PaddlePaddle/Paddle/pull/68328), [#68442](https://github.com/PaddlePaddle/Paddle/pull/68442), [#68679](https://github.com/PaddlePaddle/Paddle/pull/68679), [#68850](https://github.com/PaddlePaddle/Paddle/pull/68850), [#68892](https://github.com/PaddlePaddle/Paddle/pull/68892), [#68991](https://github.com/PaddlePaddle/Paddle/pull/68991), [#69043](https://github.com/PaddlePaddle/Paddle/pull/69043), [#69097](https://github.com/PaddlePaddle/Paddle/pull/69097), [#69210](https://github.com/PaddlePaddle/Paddle/pull/69210), [#69295](https://github.com/PaddlePaddle/Paddle/pull/69295), [#69428](https://github.com/PaddlePaddle/Paddle/pull/69428), [#69518](https://github.com/PaddlePaddle/Paddle/pull/69518), [#69642](https://github.com/PaddlePaddle/Paddle/pull/69642), [#69940](https://github.com/PaddlePaddle/Paddle/pull/69940), [#70118](https://github.com/PaddlePaddle/Paddle/pull/70118), [#70169](https://github.com/PaddlePaddle/Paddle/pull/70169), [#70218](https://github.com/PaddlePaddle/Paddle/pull/70218), [#70287](https://github.com/PaddlePaddle/Paddle/pull/70287), [#70412](https://github.com/PaddlePaddle/Paddle/pull/70412), [#71099](https://github.com/PaddlePaddle/Paddle/pull/71099), [#71156](https://github.com/PaddlePaddle/Paddle/pull/71156), [#71193](https://github.com/PaddlePaddle/Paddle/pull/71193), [#71336](https://github.com/PaddlePaddle/Paddle/pull/71336), [#71463](https://github.com/PaddlePaddle/Paddle/pull/71463), [#71476](https://github.com/PaddlePaddle/Paddle/pull/71476), [#71503](https://github.com/PaddlePaddle/Paddle/pull/71503)
-- Inplace strategy upgrade. [#65491](https://github.com/PaddlePaddle/Paddle/pull/65491)
-- Control flow related development. [#67251](https://github.com/PaddlePaddle/Paddle/pull/67251)
-- Add environment variables. [#68467](https://github.com/PaddlePaddle/Paddle/pull/68467)
-- Support sparse operator operations. [#67111](https://github.com/PaddlePaddle/Paddle/pull/67111)
-- Other execution support development, including logic optimization, version adaptation, and adding unit tests. [#69241](https://github.com/PaddlePaddle/Paddle/pull/69241), [#69806](https://github.com/PaddlePaddle/Paddle/pull/69806), [#70768](https://github.com/PaddlePaddle/Paddle/pull/70768), [#66829](https://github.com/PaddlePaddle/Paddle/pull/66829), [#67110](https://github.com/PaddlePaddle/Paddle/pull/67110), [#67442](https://github.com/PaddlePaddle/Paddle/pull/67442), [#67041](https://github.com/PaddlePaddle/Paddle/pull/67041), [#67452](https://github.com/PaddlePaddle/Paddle/pull/67452), [#69061](https://github.com/PaddlePaddle/Paddle/pull/69061), [#69307](https://github.com/PaddlePaddle/Paddle/pull/69307), [#68669](https://github.com/PaddlePaddle/Paddle/pull/68669), [#69829](https://github.com/PaddlePaddle/Paddle/pull/69829), [#70003](https://github.com/PaddlePaddle/Paddle/pull/70003), [#70443](https://github.com/PaddlePaddle/Paddle/pull/70443), [#70364](https://github.com/PaddlePaddle/Paddle/pull/70364), [#71495](https://github.com/PaddlePaddle/Paddle/pull/71495)
-
-### Performance optimization
-
-- Optimize dynamic shape handling in static graph conversion, reducing graph construction iterations and compilation time. [#65235](https://github.com/PaddlePaddle/Paddle/pull/65235), [#65477](https://github.com/PaddlePaddle/Paddle/pull/65477), [#65517](https://github.com/PaddlePaddle/Paddle/pull/65517), [#65882](https://github.com/PaddlePaddle/Paddle/pull/65882), [#66346](https://github.com/PaddlePaddle/Paddle/pull/66346), [#66746](https://github.com/PaddlePaddle/Paddle/pull/66746), [#67786](https://github.com/PaddlePaddle/Paddle/pull/67786), [#67876](https://github.com/PaddlePaddle/Paddle/pull/67876), [#68113](https://github.com/PaddlePaddle/Paddle/pull/68113), [#68302](https://github.com/PaddlePaddle/Paddle/pull/68302), [#68337](https://github.com/PaddlePaddle/Paddle/pull/68337), [#68616](https://github.com/PaddlePaddle/Paddle/pull/68616), [#69354](https://github.com/PaddlePaddle/Paddle/pull/69354), [#70009](https://github.com/PaddlePaddle/Paddle/pull/70009), [#70877](https://github.com/PaddlePaddle/Paddle/pull/70877)
-- End-to-end performance optimization for SOT, minimizing subgraph fragmentation, reducing scheduling overhead, and improving static training efficiency. [#67591](https://github.com/PaddlePaddle/Paddle/pull/67591), [#67746](https://github.com/PaddlePaddle/Paddle/pull/67746), [#67823](https://github.com/PaddlePaddle/Paddle/pull/67823), [#67890](https://github.com/PaddlePaddle/Paddle/pull/67890), [#67921](https://github.com/PaddlePaddle/Paddle/pull/67921), [#68031](https://github.com/PaddlePaddle/Paddle/pull/68031), [#68153](https://github.com/PaddlePaddle/Paddle/pull/68153), [#68729](https://github.com/PaddlePaddle/Paddle/pull/68729), [#69249](https://github.com/PaddlePaddle/Paddle/pull/69249), [#69263](https://github.com/PaddlePaddle/Paddle/pull/69263), [#69300](https://github.com/PaddlePaddle/Paddle/pull/69300), [#69313](https://github.com/PaddlePaddle/Paddle/pull/69313), [#69325](https://github.com/PaddlePaddle/Paddle/pull/69325), [#69353](https://github.com/PaddlePaddle/Paddle/pull/69353), [#69411](https://github.com/PaddlePaddle/Paddle/pull/69411), [#69506](https://github.com/PaddlePaddle/Paddle/pull/69506), [#69672](https://github.com/PaddlePaddle/Paddle/pull/69672), [#69746](https://github.com/PaddlePaddle/Paddle/pull/69746), [#69834](https://github.com/PaddlePaddle/Paddle/pull/69834), [#69836](https://github.com/PaddlePaddle/Paddle/pull/69836), [#69852](https://github.com/PaddlePaddle/Paddle/pull/69852), [#69975](https://github.com/PaddlePaddle/Paddle/pull/69975), [#70151](https://github.com/PaddlePaddle/Paddle/pull/70151), [#70293](https://github.com/PaddlePaddle/Paddle/pull/70293), [#70405](https://github.com/PaddlePaddle/Paddle/pull/70405), [#70851](https://github.com/PaddlePaddle/Paddle/pull/70851), [#71039](https://github.com/PaddlePaddle/Paddle/pull/71039), [#71254](https://github.com/PaddlePaddle/Paddle/pull/71254), [#71295](https://github.com/PaddlePaddle/Paddle/pull/71295), [#71298](https://github.com/PaddlePaddle/Paddle/pull/71298), [#71346](https://github.com/PaddlePaddle/Paddle/pull/71346), [#71377](https://github.com/PaddlePaddle/Paddle/pull/71377), [#71407](https://github.com/PaddlePaddle/Paddle/pull/71407)
-- Optimize the performance of dynamic shape scenarios. [#68491](https://github.com/PaddlePaddle/Paddle/pull/68491), [#68629](https://github.com/PaddlePaddle/Paddle/pull/68629)
-- Accelerate the execution speed of PIR executor. [#69513](https://github.com/PaddlePaddle/Paddle/pull/69513)
-- Optimize PIR saving and loading performance. [#69683](https://github.com/PaddlePaddle/Paddle/pull/69683)
-- Optimize for device. [#69676](https://github.com/PaddlePaddle/Paddle/pull/69676)
-- Clean up redundant input and output information. [#66278](https://github.com/PaddlePaddle/Paddle/pull/66278)
-
-### Discontinued Features
-
-- Remove outdated test cases. [#66269](https://github.com/PaddlePaddle/Paddle/pull/66269), [#66690](https://github.com/PaddlePaddle/Paddle/pull/66690), [#67505](https://github.com/PaddlePaddle/Paddle/pull/67505), [#67464](https://github.com/PaddlePaddle/Paddle/pull/67464), [#68400](https://github.com/PaddlePaddle/Paddle/pull/68400), [#68178](https://github.com/PaddlePaddle/Paddle/pull/68178), [#68194](https://github.com/PaddlePaddle/Paddle/pull/68194)
-- Clean up obsolete flags and configurations. [#69124](https://github.com/PaddlePaddle/Paddle/pull/69124), [#69176](https://github.com/PaddlePaddle/Paddle/pull/69176), [#69274](https://github.com/PaddlePaddle/Paddle/pull/69274), [#68384](https://github.com/PaddlePaddle/Paddle/pull/68384)
-- Eliminate old APIs. [#66032](https://github.com/PaddlePaddle/Paddle/pull/66032), [#67303](https://github.com/PaddlePaddle/Paddle/pull/67303)
-- Cleaned up PIR redundancy strategy and single test. [#66366](https://github.com/PaddlePaddle/Paddle/pull/66366), [#70534](https://github.com/PaddlePaddle/Paddle/pull/70534), [#68444](https://github.com/PaddlePaddle/Paddle/pull/68444), [#70599](https://github.com/PaddlePaddle/Paddle/pull/70599), [#68801](https://github.com/PaddlePaddle/Paddle/pull/68801), [#66303](https://github.com/PaddlePaddle/Paddle/pull/66303), [#67854](https://github.com/PaddlePaddle/Paddle/pull/67854), [#70795](https://github.com/PaddlePaddle/Paddle/pull/70795)
-- Discard the related unit tests and APIs for dynamic-to-static conversion. [#66421](https://github.com/PaddlePaddle/Paddle/pull/66421), [#68251](https://github.com/PaddlePaddle/Paddle/pull/68251), [#68252](https://github.com/PaddlePaddle/Paddle/pull/68252), [#68253](https://github.com/PaddlePaddle/Paddle/pull/68253), [#68254](https://github.com/PaddlePaddle/Paddle/pull/68254), [#68409](https://github.com/PaddlePaddle/Paddle/pull/68409), [#70569](https://github.com/PaddlePaddle/Paddle/pull/70569), [#71279](https://github.com/PaddlePaddle/Paddle/pull/71279)
-- Discard the related unit tests for automatic parallelism. [#67857](https://github.com/PaddlePaddle/Paddle/pull/67857), [#67862](https://github.com/PaddlePaddle/Paddle/pull/67862), [#67995](https://github.com/PaddlePaddle/Paddle/pull/67995), [#68012](https://github.com/PaddlePaddle/Paddle/pull/68012), [#68013](https://github.com/PaddlePaddle/Paddle/pull/68013), [#67798](https://github.com/PaddlePaddle/Paddle/pull/67798)
-
-## 3. Compiler architecture
-
-The CINN compiler has seen comprehensive improvements in completeness and performance. In this version, we have conducted thorough optimizations across all aspects of the compiler's front-end and back-end: including the addition of an automatic Re-Compute mechanism for reverse computation graphs, front-end Pass performance optimization, symbol derivation mechanism upgrades, operator fusion strategy optimization, back-end Schedule strategy, and enhanced subscript expression simplification capabilities. At the same time, we have investigated and fixed a large number of correctness and performance issues, systematically enhancing the compiler's general optimization capabilities. When the CINN compiler is enabled for the PaddlePaddle PaddleX series models, over 60% of the models show significant performance improvements compared to dynamic graph mode.
+Supports FP8 matrix operations, enhances model training efficiency, and simultaneously enhances multiple models to improve stability; provides a C_ops-style interface for calling the inverse, facilitating memory optimization and functional experimentation.
### New Features
-1. New hardware backend support: Added support for two new backends, HIP and SYCL. ([#65146](https://github.com/PaddlePaddle/Paddle/pull/65146), [#65329](https://github.com/PaddlePaddle/Paddle/pull/65329), [#69554](https://github.com/PaddlePaddle/Paddle/pull/69554), [#71204](https://github.com/PaddlePaddle/Paddle/pull/71204), [#65438](https://github.com/PaddlePaddle/Paddle/pull/65438), [#66476](https://github.com/PaddlePaddle/Paddle/pull/66476), [#66620](https://github.com/PaddlePaddle/Paddle/pull/66620), [#67813](https://github.com/PaddlePaddle/Paddle/pull/67813))
-2. Added support for manual setting of numerical ranges, equality constraints, and other information for symbol dimensions in reasoning scenarios. ([#67628](https://github.com/PaddlePaddle/Paddle/pull/67628), [#67384](https://github.com/PaddlePaddle/Paddle/pull/67384))
-
-### Function optimization
-
-1. Optimize the printing of error messages to enhance the development and debugging experience. ([#67738](https://github.com/PaddlePaddle/Paddle/pull/67738), [#68769](https://github.com/PaddlePaddle/Paddle/pull/68769), [#71076](https://github.com/PaddlePaddle/Paddle/pull/71076))
-2. Support the Welford algorithm, which can simultaneously ensure the performance and accuracy of the BatchNorm-related operator Kenrel. ([#71184](https://github.com/PaddlePaddle/Paddle/pull/71184), [#71057](https://github.com/PaddlePaddle/Paddle/pull/71057))
-
-### Performance optimization
-
-1. New backend optimization strategies such as GridReduce, Loop merging, Transpose tuning, and automatic vectorization have been added, significantly enhancing Kernel performance across various dimensional spaces and under different hardware configurations in all scenarios. ([#67236](https://github.com/PaddlePaddle/Paddle/pull/67236), [#68897](https://github.com/PaddlePaddle/Paddle/pull/68897), [#69409](https://github.com/PaddlePaddle/Paddle/pull/69409), [#65336](https://github.com/PaddlePaddle/Paddle/pull/65336), [#66419](https://github.com/PaddlePaddle/Paddle/pull/66419), [#68338](https://github.com/PaddlePaddle/Paddle/pull/68338), [#68364](https://github.com/PaddlePaddle/Paddle/pull/68364), [#71087](https://github.com/PaddlePaddle/Paddle/pull/71087), [#68019](https://github.com/PaddlePaddle/Paddle/pull/68019), [#68122](https://github.com/PaddlePaddle/Paddle/pull/68122), [#65187](https://github.com/PaddlePaddle/Paddle/pull/65187), [#66742](https://github.com/PaddlePaddle/Paddle/pull/66742), [#67083](https://github.com/PaddlePaddle/Paddle/pull/67083), [#68667](https://github.com/PaddlePaddle/Paddle/pull/68667), [#68750](https://github.com/PaddlePaddle/Paddle/pull/68750), [#69376](https://github.com/PaddlePaddle/Paddle/pull/69376), [#69350](https://github.com/PaddlePaddle/Paddle/pull/69350), [#69740](https://github.com/PaddlePaddle/Paddle/pull/69740), [#68918](https://github.com/PaddlePaddle/Paddle/pull/68918), [#70092](https://github.com/PaddlePaddle/Paddle/pull/70092), [#69607](https://github.com/PaddlePaddle/Paddle/pull/69607), [#69794](https://github.com/PaddlePaddle/Paddle/pull/69794), [#70258](https://github.com/PaddlePaddle/Paddle/pull/70258), [#70547](https://github.com/PaddlePaddle/Paddle/pull/70547), [#70581](https://github.com/PaddlePaddle/Paddle/pull/70581), [#70649](https://github.com/PaddlePaddle/Paddle/pull/70649), [#69732](https://github.com/PaddlePaddle/Paddle/pull/69732), [#70786](https://github.com/PaddlePaddle/Paddle/pull/70786), [#70942](https://github.com/PaddlePaddle/Paddle/pull/70942), [#71014](https://github.com/PaddlePaddle/Paddle/pull/71014), [#71263](https://github.com/PaddlePaddle/Paddle/pull/71263), [#71249](https://github.com/PaddlePaddle/Paddle/pull/71249), [#71340](https://github.com/PaddlePaddle/Paddle/pull/71340), [#71301](https://github.com/PaddlePaddle/Paddle/pull/71301), [#71380](https://github.com
-2. Optimize operator fusion strategies, upgrading various strategies including horizontal fusion, multi-downstream fusion, Reshape alignment fusion, etc., to further enhance the fusion capabilities of operators and improve end-to-end optimization performance. ([#66034](https://github.com/PaddlePaddle/Paddle/pull/66034), [#67829](https://github.com/PaddlePaddle/Paddle/pull/67829), [#68171](https://github.com/PaddlePaddle/Paddle/pull/68171), [#69478](https://github.com/PaddlePaddle/Paddle/pull/69478), [#69691](https://github.com/PaddlePaddle/Paddle/pull/69691), [#70665](https://github.com/PaddlePaddle/Paddle/pull/70665), [#71103](https://github.com/PaddlePaddle/Paddle/pull/71103), [#70873](https://github.com/PaddlePaddle/Paddle/pull/70873))
-3. The simplification capability of backend subscript expressions has been upgraded, supporting the simplification of complex expressions with dynamic and static dimensions, significantly reducing the subscript computation overhead in the generated backend Kernel. ([#68011](https://github.com/PaddlePaddle/Paddle/pull/68011), [#68617](https://github.com/PaddlePaddle/Paddle/pull/68617), [#68624](https://github.com/PaddlePaddle/Paddle/pull/68624), [#68685](https://github.com/PaddlePaddle/Paddle/pull/68685), [#68220](https://github.com/PaddlePaddle/Paddle/pull/68220), [#68720](https://github.com/PaddlePaddle/Paddle/pull/68720), [#68753](https://github.com/PaddlePaddle/Paddle/pull/68753), [#68986](https://github.com/PaddlePaddle/Paddle/pull/68986), [#68987](https://github.com/PaddlePaddle/Paddle/pull/68987), [#69071](https://github.com/PaddlePaddle/Paddle/pull/69071), [#69164](https://github.com/PaddlePaddle/Paddle/pull/69164), [#69282](https://github.com/PaddlePaddle/Paddle/pull/69282), [#69522](https://github.com/PaddlePaddle/Paddle/pull/69522), [#69857](https://github.com/PaddlePaddle/Paddle/pull/69857), [#70208](https://github.com/PaddlePaddle/Paddle/pull/70208), [#70355](https://github.com/PaddlePaddle/Paddle/pull/70355), [#70427](https://github.com/PaddlePaddle/Paddle/pull/70208), [#70450](https://github.com/PaddlePaddle/Paddle/pull/70450), [#68737](https://github.com/PaddlePaddle/Paddle/pull/68737), [#70500](https://github.com/PaddlePaddle/Paddle/pull/70500), [#70953](https://github.com/PaddlePaddle/Paddle/pull/70953), [#70933](https://github.com/PaddlePaddle/Paddle/pull/70933), [#71026](https://github.com/PaddlePaddle/Paddle/pull/71026), [#70456](https://github.com/PaddlePaddle/Paddle/pull/70456), [#70257](https://github.com/PaddlePaddle/Paddle/pull/70257), [#70461](https://github.com/PaddlePaddle/Paddle/pull/70461), [#70142](https://github.com/PaddlePaddle/Paddle/pull/70142), [#71018](https://github.com/PaddlePaddle/Paddle/pull/71018), [#71278](https://github.com/PaddlePaddle/Paddle/pull/71278))
-4. A new automatic Re-Compute mechanism for reverse computation graphs has been added, which can effectively reduce model training memory usage and improve performance. ([#69342](https://github.com/PaddlePaddle/Paddle/pull/69342), [#70255](https://github.com/PaddlePaddle/Paddle/pull/70255), [#68241](https://github.com/PaddlePaddle/Paddle/pull/68241), [#69954](https://github.com/PaddlePaddle/Paddle/pull/69954), [#70832](https://github.com/PaddlePaddle/Paddle/pull/70832))
-5. Optimize the backend Host and Device code compilation process to reduce compilation time and improve the processing performance of branches in the Broadcast scenario. ([#65669](https://github.com/PaddlePaddle/Paddle/pull/65669), [#65916](https://github.com/PaddlePaddle/Paddle/pull/65916), [#66109](https://github.com/PaddlePaddle/Paddle/pull/66109), [#65611](https://github.com/PaddlePaddle/Paddle/pull/65611), [#65990](https://github.com/PaddlePaddle/Paddle/pull/65990), [#66088](https://github.com/PaddlePaddle/Paddle/pull/66088), [#66207](https://github.com/PaddlePaddle/Paddle/pull/66207), [#66537](https://github.com/PaddlePaddle/Paddle/pull/66537), [#66768](https://github.com/PaddlePaddle/Paddle/pull/66768), [#70685](https://github.com/PaddlePaddle/Paddle/pull/70685), [#71410](https://github.com/PaddlePaddle/Paddle/pull/71410), [#66062](https://github.com/PaddlePaddle/Paddle/pull/66062))
-6. Improved and upgraded the mechanisms for symbol derivation, simplification, and caching in dynamic dimensions, added symbol derivation interface implementations for all conventional operators (580+), and provided more constraint information for Kernel compilation.([#65343](https://github.com/PaddlePaddle/Paddle/pull/65343)、[#66582](https://github.com/PaddlePaddle/Paddle/pull/66582)、[#65500](https://github.com/PaddlePaddle/Paddle/pull/65500)、[#65591](https://github.com/PaddlePaddle/Paddle/pull/65591)、[#66637](https://github.com/PaddlePaddle/Paddle/pull/66637)、[#68208](https://github.com/PaddlePaddle/Paddle/pull/68208)、[#68056](https://github.com/PaddlePaddle/Paddle/pull/68056)、[#68015](https://github.com/PaddlePaddle/Paddle/pull/68015)、[#68096](https://github.com/PaddlePaddle/Paddle/pull/68096)、[#68236](https://github.com/PaddlePaddle/Paddle/pull/68236)、[#68973](https://github.com/PaddlePaddle/Paddle/pull/68973)、[#68967](https://github.com/PaddlePaddle/Paddle/pull/68967)、[#69133](https://github.com/PaddlePaddle/Paddle/pull/69133)、[#68550](https://github.com/PaddlePaddle/Paddle/pull/68550)、[#68882](https://github.com/PaddlePaddle/Paddle/pull/68882)、[#69005](https://github.com/PaddlePaddle/Paddle/pull/69005)、[#69911](https://github.com/PaddlePaddle/Paddle/pull/69911)、[#70376](https://github.com/PaddlePaddle/Paddle/pull/70376)、[#71153](https://github.com/PaddlePaddle/Paddle/pull/71153)、[#66644](https://github.com/PaddlePaddle/Paddle/pull/66644)、[#66650](https://github.com/PaddlePaddle/Paddle/pull/66650)、[#66642](https://github.com/PaddlePaddle/Paddle/pull/66642)、[#66729](https://github.com/PaddlePaddle/Paddle/pull/66729)、[#66838](https://github.com/PaddlePaddle/Paddle/pull/66838)、[#66762](https://github.com/PaddlePaddle/Paddle/pull/66762)、[#66580](https://github.com/PaddlePaddle/Paddle/pull/66580)、[#66612](https://github.com/PaddlePaddle/Paddle/pull/66612)、[#66625](https://github.com/PaddlePaddle/Paddle/pull/66625)、[#66643](https://github.com/PaddlePaddle/Paddle/pull/66643)、[#66837](https://github.com/PaddlePaddle/Paddle/pull/66837)、[#66946](https://github.com/PaddlePaddle/Paddle/pull/66946)、[#67018](https://github.com/PaddlePaddle/Paddle/pull/67018)、[#67049](https://github.com/PaddlePaddle/Paddle/pull/67049)、[#66956](https://github.com/PaddlePaddle/Paddle/pull/66956)、[#67008](https://github.com/PaddlePaddle/Paddle/pull/67008)、[#66930](https://github.com/PaddlePaddle/Paddle/pull/66930)、[#66877](https://github.com/PaddlePaddle/Paddle/pull/66877)、[#66896](https://github.com/PaddlePaddle/Paddle/pull/66896)、[#67120](https://github.com/PaddlePaddle/Paddle/pull/67120)、[#67117](https://github.com/PaddlePaddle/Paddle/pull/67117)、[#67098](https://github.com/PaddlePaddle/Paddle/pull/67098)、[#67136](https://github.com/PaddlePaddle/Paddle/pull/67136)、[#67294](https://github.com/PaddlePaddle/Paddle/pull/67294)、[#67327](https://github.com/PaddlePaddle/Paddle/pull/67327)、[#66827](https://github.com/PaddlePaddle/Paddle/pull/66827)、[#67201](https://github.com/PaddlePaddle/Paddle/pull/67201)、[#66892](https://github.com/PaddlePaddle/Paddle/pull/66892)、[#67377](https://github.com/PaddlePaddle/Paddle/pull/67377)、[#66619](https://github.com/PaddlePaddle/Paddle/pull/66619)、[#67037](https://github.com/PaddlePaddle/Paddle/pull/67037)、[#67412](https://github.com/PaddlePaddle/Paddle/pull/67412)、[#67394](https://github.com/PaddlePaddle/Paddle/pull/67394)、[#67374](https://github.com/PaddlePaddle/Paddle/pull/67374)、[#67418](https://github.com/PaddlePaddle/Paddle/pull/67418)、[#67348](https://github.com/PaddlePaddle/Paddle/pull/67348)、[#67337](https://github.com/PaddlePaddle/Paddle/pull/67337)、[#67390](https://github.com/PaddlePaddle/Paddle/pull/67390)、[#67407](https://github.com/PaddlePaddle/Paddle/pull/67407)、[#67491](https://github.com/PaddlePaddle/Paddle/pull/67491)、[#67422](https://github.com/PaddlePaddle/Paddle/pull/67422)、[#67461](https://github.com/PaddlePaddle/Paddle/pull/67461)、[#67458](https://github.com/PaddlePaddle/Paddle/pull/67458)、[#67486](https://github.com/PaddlePaddle/Paddle/pull/67486)、[#67490](https://github.com/PaddlePaddle/Paddle/pull/67490)、[#67462](https://github.com/PaddlePaddle/Paddle/pull/67462)、[#67364](https://github.com/PaddlePaddle/Paddle/pull/67364)、[#67435](https://github.com/PaddlePaddle/Paddle/pull/67435)、[#67665](https://github.com/PaddlePaddle/Paddle/pull/67665)、[#67426](https://github.com/PaddlePaddle/Paddle/pull/67426)、[#67507](https://github.com/PaddlePaddle/Paddle/pull/67507)、[#67730](https://github.com/PaddlePaddle/Paddle/pull/67730)、[#67776](https://github.com/PaddlePaddle/Paddle/pull/67776)、[#67806](https://github.com/PaddlePaddle/Paddle/pull/67806)、[#67803](https://github.com/PaddlePaddle/Paddle/pull/67803)、[#67788](https://github.com/PaddlePaddle/Paddle/pull/67788)、[#67705](https://github.com/PaddlePaddle/Paddle/pull/67705)、[#67814](https://github.com/PaddlePaddle/Paddle/pull/67814)、[#67858](https://github.com/PaddlePaddle/Paddle/pull/67858)、[#67751](https://github.com/PaddlePaddle/Paddle/pull/67751)、[#67875](https://github.com/PaddlePaddle/Paddle/pull/67875)、[#67663](https://github.com/PaddlePaddle/Paddle/pull/67663)、[#67434](https://github.com/PaddlePaddle/Paddle/pull/67434)、[#67818](https://github.com/PaddlePaddle/Paddle/pull/67818)、[#68180](https://github.com/PaddlePaddle/Paddle/pull/68180)、[#68547](https://github.com/PaddlePaddle/Paddle/pull/68547)、[#68548](https://github.com/PaddlePaddle/Paddle/pull/68548)、[#68670](https://github.com/PaddlePaddle/Paddle/pull/68670)、[#68964](https://github.com/PaddlePaddle/Paddle/pull/68964)、[#68929](https://github.com/PaddlePaddle/Paddle/pull/68929)、[#68907](https://github.com/PaddlePaddle/Paddle/pull/68907)、[#68917](https://github.com/PaddlePaddle/Paddle/pull/68917)、[#68984](https://github.com/PaddlePaddle/Paddle/pull/68984)、[#68644](https://github.com/PaddlePaddle/Paddle/pull/68644)、[#69167](https://github.com/PaddlePaddle/Paddle/pull/69167)、[#68975](https://github.com/PaddlePaddle/Paddle/pull/68975)、[#68947](https://github.com/PaddlePaddle/Paddle/pull/68947)、[#68978](https://github.com/PaddlePaddle/Paddle/pull/68978)、[#68980](https://github.com/PaddlePaddle/Paddle/pull/68980)、[#68979](https://github.com/PaddlePaddle/Paddle/pull/68979)、[#69329](https://github.com/PaddlePaddle/Paddle/pull/69329)、[#69055](https://github.com/PaddlePaddle/Paddle/pull/69055)、[#69331](https://github.com/PaddlePaddle/Paddle/pull/69331)、[#69414](https://github.com/PaddlePaddle/Paddle/pull/69414)、[#69335](https://github.com/PaddlePaddle/Paddle/pull/69335)、[#69017](https://github.com/PaddlePaddle/Paddle/pull/69017)、[#69344](https://github.com/PaddlePaddle/Paddle/pull/69344)、[#69069](https://github.com/PaddlePaddle/Paddle/pull/69069)、[#69698](https://github.com/PaddlePaddle/Paddle/pull/69698)、[#69919](https://github.com/PaddlePaddle/Paddle/pull/69919)、[#69964](https://github.com/PaddlePaddle/Paddle/pull/69964)、[#70337](https://github.com/PaddlePaddle/Paddle/pull/70337)、[#70282](https://github.com/PaddlePaddle/Paddle/pull/70282)、[#70741](https://github.com/PaddlePaddle/Paddle/pull/70741)、[#70818](https://github.com/PaddlePaddle/Paddle/pull/70818)、[#71031](https://github.com/PaddlePaddle/Paddle/pull/71031)、[#70541](https://github.com/PaddlePaddle/Paddle/pull/70541)、[#66609](https://github.com/PaddlePaddle/Paddle/pull/66609)、[#66889](https://github.com/PaddlePaddle/Paddle/pull/66889)、[#66633](https://github.com/PaddlePaddle/Paddle/pull/66633)、[#66735](https://github.com/PaddlePaddle/Paddle/pull/66735)、[#66935](https://github.com/PaddlePaddle/Paddle/pull/66935)、[#66627](https://github.com/PaddlePaddle/Paddle/pull/66627)、[#66730](https://github.com/PaddlePaddle/Paddle/pull/66730)、[#67210](https://github.com/PaddlePaddle/Paddle/pull/67210)、[#67115](https://github.com/PaddlePaddle/Paddle/pull/67115)、[#67275](https://github.com/PaddlePaddle/Paddle/pull/67275)、[#67472](https://github.com/PaddlePaddle/Paddle/pull/67472)、[#67577](https://github.com/PaddlePaddle/Paddle/pull/67577)、[#67328](https://github.com/PaddlePaddle/Paddle/pull/67328)、[#67566](https://github.com/PaddlePaddle/Paddle/pull/67566)、[#67451](https://github.com/PaddlePaddle/Paddle/pull/67451)、[#68098](https://github.com/PaddlePaddle/Paddle/pull/68098)、[#68225](https://github.com/PaddlePaddle/Paddle/pull/68225)、[#68177](https://github.com/PaddlePaddle/Paddle/pull/68177)、[#68102](https://github.com/PaddlePaddle/Paddle/pull/68102)、[#67951](https://github.com/PaddlePaddle/Paddle/pull/67951)、[#67957](https://github.com/PaddlePaddle/Paddle/pull/67957)、[#68235](https://github.com/PaddlePaddle/Paddle/pull/68235)、[#68447](https://github.com/PaddlePaddle/Paddle/pull/68447)、[#68446](https://github.com/PaddlePaddle/Paddle/pull/68446)、[#68183](https://github.com/PaddlePaddle/Paddle/pull/68183)、[#68318](https://github.com/PaddlePaddle/Paddle/pull/68318)、[#68385](https://github.com/PaddlePaddle/Paddle/pull/68385)、[#67635](https://github.com/PaddlePaddle/Paddle/pull/67635)、[#65623](https://github.com/PaddlePaddle/Paddle/pull/65623)、[#65956](https://github.com/PaddlePaddle/Paddle/pull/65956)、[#66063](https://github.com/PaddlePaddle/Paddle/pull/66063)、[#65992](https://github.com/PaddlePaddle/Paddle/pull/65992)、[#65880](https://github.com/PaddlePaddle/Paddle/pull/65880)、[#66343](https://github.com/PaddlePaddle/Paddle/pull/66343)、[#65889](https://github.com/PaddlePaddle/Paddle/pull/65889)、[#66606](https://github.com/PaddlePaddle/Paddle/pull/66606)、[#66618](https://github.com/PaddlePaddle/Paddle/pull/66618)、[#66737](https://github.com/PaddlePaddle/Paddle/pull/66737)、[#66607](https://github.com/PaddlePaddle/Paddle/pull/66607)、[#66579](https://github.com/PaddlePaddle/Paddle/pull/66579)、[#66732](https://github.com/PaddlePaddle/Paddle/pull/66732)、[#66849](https://github.com/PaddlePaddle/Paddle/pull/66849)、[#66400](https://github.com/PaddlePaddle/Paddle/pull/66400)、[#66952](https://github.com/PaddlePaddle/Paddle/pull/66952)、[#66570](https://github.com/PaddlePaddle/Paddle/pull/66570)、[#66967](https://github.com/PaddlePaddle/Paddle/pull/66967)、[#66595](https://github.com/PaddlePaddle/Paddle/pull/66595)、[#67121](https://github.com/PaddlePaddle/Paddle/pull/67121)、[#67206](https://github.com/PaddlePaddle/Paddle/pull/67206)、[#67444](https://github.com/PaddlePaddle/Paddle/pull/67444)、[#67494](https://github.com/PaddlePaddle/Paddle/pull/67494)、[#67499](https://github.com/PaddlePaddle/Paddle/pull/67499)、[#67267](https://github.com/PaddlePaddle/Paddle/pull/67267)、[#67567](https://github.com/PaddlePaddle/Paddle/pull/67567)、[#67455](https://github.com/PaddlePaddle/Paddle/pull/67455)、[#67161](https://github.com/PaddlePaddle/Paddle/pull/67161)、[#67581](https://github.com/PaddlePaddle/Paddle/pull/67581)、[#67539](https://github.com/PaddlePaddle/Paddle/pull/67539)、[#67625](https://github.com/PaddlePaddle/Paddle/pull/67625)、[#67690](https://github.com/PaddlePaddle/Paddle/pull/67690)、[#67454](https://github.com/PaddlePaddle/Paddle/pull/67454)、[#67731](https://github.com/PaddlePaddle/Paddle/pull/67731)、[#67734](https://github.com/PaddlePaddle/Paddle/pull/67734)、[#67735](https://github.com/PaddlePaddle/Paddle/pull/67735)、[#67607](https://github.com/PaddlePaddle/Paddle/pull/67607)、[#67413](https://github.com/PaddlePaddle/Paddle/pull/67413)、[#67387](https://github.com/PaddlePaddle/Paddle/pull/67387)、[#67882](https://github.com/PaddlePaddle/Paddle/pull/67882)、[#67864](https://github.com/PaddlePaddle/Paddle/pull/67864)、[#67503](https://github.com/PaddlePaddle/Paddle/pull/67503)、[#67861](https://github.com/PaddlePaddle/Paddle/pull/67861)、[#67888](https://github.com/PaddlePaddle/Paddle/pull/67888)、[#67884](https://github.com/PaddlePaddle/Paddle/pull/67884)、[#67826](https://github.com/PaddlePaddle/Paddle/pull/67826)、[#68044](https://github.com/PaddlePaddle/Paddle/pull/68044)、[#67851](https://github.com/PaddlePaddle/Paddle/pull/67851)、[#68276](https://github.com/PaddlePaddle/Paddle/pull/68276)、[#69888](https://github.com/PaddlePaddle/Paddle/pull/69888)、[#70093](https://github.com/PaddlePaddle/Paddle/pull/70093)、[#70436](https://github.com/PaddlePaddle/Paddle/pull/70436)、[#70914](https://github.com/PaddlePaddle/Paddle/pull/70914)、[#71222](https://github.com/PaddlePaddle/Paddle/pull/71222))
-7. Optimized some front-end passes to enhance the robustness of the front-end processing flow and improve the performance of computationally intensive subgraphs. ([#65142](https://github.com/PaddlePaddle/Paddle/pull/65142), [#67466](https://github.com/PaddlePaddle/Paddle/pull/67466), [#69228](https://github.com/PaddlePaddle/Paddle/pull/69228), [#70994](https://github.com/PaddlePaddle/Paddle/pull/70994), [#71226](https://github.com/PaddlePaddle/Paddle/pull/71226), [#71297](https://github.com/PaddlePaddle/Paddle/pull/71297), [#71443](https://github.com/PaddlePaddle/Paddle/pull/71443))
-8. Designed new backend IR basic components and related Pass interfaces to provide a more concise and efficient way of developing optimization strategies. Through automatic pruning strategies, it can effectively reduce the traversal overhead of backend IR. ([#70485](https://github.com/PaddlePaddle/Paddle/pull/70485), [#70765](https://github.com/PaddlePaddle/Paddle/pull/70765), [#71042](https://github.com/PaddlePaddle/Paddle/pull/71042), [#70952](https://github.com/PaddlePaddle/Paddle/pull/70952), [#69454](https://github.com/PaddlePaddle/Paddle/pull/69454), [#70361](https://github.com/PaddlePaddle/Paddle/pull/70361), [#70334](https://github.com/PaddlePaddle/Paddle/pull/70334), [#70406](https://github.com/PaddlePaddle/Paddle/pull/70406), [#70191](https://github.com/PaddlePaddle/Paddle/pull/70191), [#70462](https://github.com/PaddlePaddle/Paddle/pull/70462), [#70548](https://github.com/PaddlePaddle/Paddle/pull/70548), [#70592](https://github.com/PaddlePaddle/Paddle/pull/70592), [#70437](https://github.com/PaddlePaddle/Paddle/pull/70437), [#70619](https://github.com/PaddlePaddle/Paddle/pull/70619), [#70543](https://github.com/PaddlePaddle/Paddle/pull/70543), [#69611](https://github.com/PaddlePaddle/Paddle/pull/69611), [#70739](https://github.com/PaddlePaddle/Paddle/pull/70739), [#70533](https://github.com/PaddlePaddle/Paddle/pull/70533), [#70696](https://github.com/PaddlePaddle/Paddle/pull/70696), [#70498](https://github.com/PaddlePaddle/Paddle/pull/70498), [#70829](https://github.com/PaddlePaddle/Paddle/pull/70829), [#71111](https://github.com/PaddlePaddle/Paddle/pull/71111), [#70883](https://github.com/PaddlePaddle/Paddle/pull/70883))
+- Support FP8 matrix multiplication acceleration to enhance computational performance and precision adaptability. [#73092](https://github.com/PaddlePaddle/Paddle/pull/73092)
+- Support for 0-size Tensor execution. [#71829](https://github.com/PaddlePaddle/Paddle/pull/71829), [#72263](https://github.com/PaddlePaddle/Paddle/pull/72263), [#72244](https://github.com/PaddlePaddle/Paddle/pull/72244), [#72814](https://github.com/PaddlePaddle/Paddle/pull/72814)
+- Support for DeepEP. [#73495](https://github.com/PaddlePaddle/Paddle/pull/73495)
+- The CINN backend is enabled by default. [#71838](https://github.com/PaddlePaddle/Paddle/pull/71838)
+- Support for SOT-related execution. [#72472](https://github.com/PaddlePaddle/Paddle/pull/72472), [#72559](https://github.com/PaddlePaddle/Paddle/pull/72559), [#72466](https://github.com/PaddlePaddle/Paddle/pull/72466), [#73269](https://github.com/PaddlePaddle/Paddle/pull/73269), [#73329](https://github.com/PaddlePaddle/Paddle/pull/73329), [#73405](https://github.com/PaddlePaddle/Paddle/pull/73405), [#73399](https://github.com/PaddlePaddle/Paddle/pull/73399), [#73424](https://github.com/PaddlePaddle/Paddle/pull/73424), [#73509](https://github.com/PaddlePaddle/Paddle/pull/73509)
+- Support for converting dynamic to static. [#73417](https://github.com/PaddlePaddle/Paddle/pull/73417), [#73081](https://github.com/PaddlePaddle/Paddle/pull/73081)
+- Added support for kernels with the stride mechanism. [#73053](https://github.com/PaddlePaddle/Paddle/pull/73053)
### Bug fixes
-1. Fix some bugs in the derivation and implementation logic of operator symbols. ([#65185](https://github.com/PaddlePaddle/Paddle/pull/65185), [#65231](https://github.com/PaddlePaddle/Paddle/pull/65231), [#65266](https://github.com/PaddlePaddle/Paddle/pull/65266), [#65951](https://github.com/PaddlePaddle/Paddle/pull/65951), [#67142](https://github.com/PaddlePaddle/Paddle/pull/67142), [#67286](https://github.com/PaddlePaddle/Paddle/pull/67286), [#65958](https://github.com/PaddlePaddle/Paddle/pull/65958), [#65955](https://github.com/PaddlePaddle/Paddle/pull/65955), [#66470](https://github.com/PaddlePaddle/Paddle/pull/66470), [#66764](https://github.com/PaddlePaddle/Paddle/pull/66764), [#66036](https://github.com/PaddlePaddle/Paddle/pull/66036), [#66662](https://github.com/PaddlePaddle/Paddle/pull/66662), [#66741](https://github.com/PaddlePaddle/Paddle/pull/66741), [#66745](https://github.com/PaddlePaddle/Paddle/pull/66745), [#66807](https://github.com/PaddlePaddle/Paddle/pull/66807), [#66791](https://github.com/PaddlePaddle/Paddle/pull/66791), [#66859](https://github.com/PaddlePaddle/Paddle/pull/66859), [#66880](https://github.com/PaddlePaddle/Paddle/pull/66880), [#66962](https://github.com/PaddlePaddle/Paddle/pull/66962))
-2. Fixed bugs in the lowering of some special operators to the compiler. ([#68698](https://github.com/PaddlePaddle/Paddle/pull/68698), [#68699](https://github.com/PaddlePaddle/Paddle/pull/68699), [#68691](https://github.com/PaddlePaddle/Paddle/pull/68691), [#68948](https://github.com/PaddlePaddle/Paddle/pull/68948), [#70144](https://github.com/PaddlePaddle/Paddle/pull/70144), [#70895](https://github.com/PaddlePaddle/Paddle/pull/70895))
-3. Fixed the issue of errors reported in some scenarios when integrating operators. ([#67038](https://github.com/PaddlePaddle/Paddle/pull/67038), [#67400](https://github.com/PaddlePaddle/Paddle/pull/67400), [#67655](https://github.com/PaddlePaddle/Paddle/pull/67655), [#67723](https://github.com/PaddlePaddle/Paddle/pull/67723), [#68029](https://github.com/PaddlePaddle/Paddle/pull/68029), [#68042](https://github.com/PaddlePaddle/Paddle/pull/68042), [#68888](https://github.com/PaddlePaddle/Paddle/pull/68888), [#69250](https://github.com/PaddlePaddle/Paddle/pull/69250), [#69937](https://github.com/PaddlePaddle/Paddle/pull/69937), [#70924](https://github.com/PaddlePaddle/Paddle/pull/70924))
-4. Fix the correctness issue of the backend when handling extreme values, and improve the robustness of the compiler. ([#68327](https://github.com/PaddlePaddle/Paddle/pull/68327))
-5. Fixed implementation logic bugs in the backend Schedule and post-processing tuning process, resolving errors and performance issues in some cases. ([#68605](https://github.com/PaddlePaddle/Paddle/pull/68605), [#68937](https://github.com/PaddlePaddle/Paddle/pull/68937), [#68587](https://github.com/PaddlePaddle/Paddle/pull/68587), [#69060](https://github.com/PaddlePaddle/Paddle/pull/69060), [#69608](https://github.com/PaddlePaddle/Paddle/pull/69608), [#71471](https://github.com/PaddlePaddle/Paddle/pull/71471), [#71068](https://github.com/PaddlePaddle/Paddle/pull/71068))
-6. Resolved the issue of randomness in the operator fusion process. ([#69547](https://github.com/PaddlePaddle/Paddle/pull/69547), [#70931](https://github.com/PaddlePaddle/Paddle/pull/70931))
+- Performance optimization and stability: Optimize training stability, enhance support for Python 3.11+ versions, improve the automatic activation logic of the CINN compiler in dynamic graph mode, fix issues related to dynamic shape inference and gradient backpropagation, optimize GPU kernel execution efficiency (such as for_range and constant folding), improve NPU memory copy and context management, and enhance large-scale model training performance and hardware utilization. [#71777](https://github.com/PaddlePaddle/Paddle/pull/71777), [#71837](https://github.com/PaddlePaddle/Paddle/pull/71837), [#71834](https://github.com/PaddlePaddle/Paddle/pull/71834), [#71950](https://github.com/PaddlePaddle/Paddle/pull/71950), [#71960](https://github.com/PaddlePaddle/Paddle/pull/71960), [#72103](https://github.com/PaddlePaddle/Paddle/pull/72103), [#70652](https://github.com/PaddlePaddle/Paddle/pull/70652), [#72313](https://github.com/PaddlePaddle/Paddle/pull/72313), [#72405](https://github.com/PaddlePaddle/Paddle/pull/72405), [#72581](https://github.com/PaddlePaddle/Paddle/pull/72581), [#73418](https://github.com/PaddlePaddle/Paddle/pull/73418)
+- Large Tensor Support Extension: The extension operator supports very large-sized tensors, including mathematical operations (lerp/mean/bmm/trapezoid), tensor operations (arg_min_max/diag/prelu), padding, comparisons (allclose/isclose), and fusion operators (softmax_mask_fuse), addressing compatibility issues in mixed-precision training. [#71916](https://github.com/PaddlePaddle/Paddle/pull/71916), [#71970](https://github.com/PaddlePaddle/Paddle/pull/71970), [#72516](https://github.com/PaddlePaddle/Paddle/pull/72516), [#72517](https://github.com/PaddlePaddle/Paddle/pull/72517), [#72638](https://github.com/PaddlePaddle/Paddle/pull/72638), [#72652](https://github.com/PaddlePaddle/Paddle/pull/72652), [#73046](https://github.com/PaddlePaddle/Paddle/pull/73046), [#73093](https://github.com/PaddlePaddle/Paddle/pull/73093), [#73136](https://github.com/PaddlePaddle/Paddle/pull/73136), [#72679](https://github.com/PaddlePaddle/Paddle/pull/72679), [#73174](https://github.com/PaddlePaddle/Paddle/pull/73174), [#73198](https://github.com/PaddlePaddle/Paddle/pull/73198), [#73121](https://github.com/PaddlePaddle/Paddle/pull/73121), [#73096](https://github.com/PaddlePaddle/Paddle/pull/73096), [#73261](https://github.com/PaddlePaddle/Paddle/pull/73261), [#73201](https://github.com/PaddlePaddle/Paddle/pull/73201), [#73291](https://github.com/PaddlePaddle/Paddle/pull/73291), [#73373](https://github.com/PaddlePaddle/Paddle/pull/73373), [#73318](https://github.com/PaddlePaddle/Paddle/pull/73318), [#73436](https://github.com/PaddlePaddle/Paddle/pull/73436), [#72705](https://github.com/PaddlePaddle/Paddle/pull/72705), [#72276](https://github.com/PaddlePaddle/Paddle/pull/72276), [#73135](https://github.com/PaddlePaddle/Paddle/pull/73135), [#73304](https://github.com/PaddlePaddle/Paddle/pull/73304), [#73381](https://github.com/PaddlePaddle/Paddle/pull/73381), [#72712](https://github.com/PaddlePaddle/Paddle/pull/72712), [#72717](https://github.com/PaddlePaddle/Paddle/pull/72717), [#72634](https://github.com/PaddlePaddle/Paddle/pull/72634), [#72562](https://github.com/PaddlePaddle/Paddle/pull/72562), [#72628](https://github.com/PaddlePaddle/Paddle/pull/72628), [#72706](https://github.com/PaddlePaddle/Paddle/pull/72706), [#72831](https://github.com/PaddlePaddle/Paddle/pull/72831), [#72888](https://github.com/PaddlePaddle/Paddle/pull/72888), [#72753](https://github.com/PaddlePaddle/Paddle/pull/72753), [#72931](https://github.com/PaddlePaddle/Paddle/pull/72931), [#73021](https://github.com/PaddlePaddle/Paddle/pull/73021), [#73064](https://github.com/PaddlePaddle/Paddle/pull/73064), [#73069](https://github.com/PaddlePaddle/Paddle/pull/73069), [#73153](https://github.com/PaddlePaddle/Paddle/pull/73153), [#73118](https://github.com/PaddlePaddle/Paddle/pull/73118), [#73252](https://github.com/PaddlePaddle/Paddle/pull/73252), [#73253](https://github.com/PaddlePaddle/Paddle/pull/73253), [#73262](https://github.com/PaddlePaddle/Paddle/pull/73262), [#73259](https://github.com/PaddlePaddle/Paddle/pull/73259), [#73288](https://github.com/PaddlePaddle/Paddle/pull/73288), [#73105](https://github.com/PaddlePaddle/Paddle/pull/73105), [#73275](https://github.com/PaddlePaddle/Paddle/pull/73275), [#73284](https://github.com/PaddlePaddle/Paddle/pull/73284), [#73110](https://github.com/PaddlePaddle/Paddle/pull/73110), [#73335](https://github.com/PaddlePaddle/Paddle/pull/73335), [#73342](https://github.com/PaddlePaddle/Paddle/pull/73342), [#73447](https://github.com/PaddlePaddle/Paddle/pull/73447), [#73460](https://github.com/PaddlePaddle/Paddle/pull/73460), [#73194](https://github.com/PaddlePaddle/Paddle/pull/73194)
+- Fix for 0-Size Tensor issue: Fixed computation anomalies caused by 0-Size Tensor, covering pooling (max_pool1d/lp_pool1d), sorting (matrix_rank), statistics (std/nanmedian), and element-level operations (elementwise compare), ensuring numerical stability and API consistency under extreme input scenarios. [#71961](https://github.com/PaddlePaddle/Paddle/pull/71961), [#72017](https://github.com/PaddlePaddle/Paddle/pull/72017), [#72785](https://github.com/PaddlePaddle/Paddle/pull/72785), [#73214](https://github.com/PaddlePaddle/Paddle/pull/73214), [#73263](https://github.com/PaddlePaddle/Paddle/pull/73263), [#73267](https://github.com/PaddlePaddle/Paddle/pull/73267), [#73280](https://github.com/PaddlePaddle/Paddle/pull/73280), [#72444](https://github.com/PaddlePaddle/Paddle/pull/72444), [#72437](https://github.com/PaddlePaddle/Paddle/pull/72437), [#72460](https://github.com/PaddlePaddle/Paddle/pull/72460), [#73090](https://github.com/PaddlePaddle/Paddle/pull/73090), [#73516](https://github.com/PaddlePaddle/Paddle/pull/73516), [#72807](https://github.com/PaddlePaddle/Paddle/pull/72807), [#72799](https://github.com/PaddlePaddle/Paddle/pull/72799), [#72800](https://github.com/PaddlePaddle/Paddle/pull/72800), [#72809](https://github.com/PaddlePaddle/Paddle/pull/72809), [#73497](https://github.com/PaddlePaddle/Paddle/pull/73497)
+- API Enhancements and Compatibility: Added support for Python standard library types (dataclasses), expanded API data type compatibility (creation of bfloat16 parameters, automatic inference of -1 dimension), fixed NumPy API interaction errors, and optimized BatchNorm memory layout. [#72059](https://github.com/PaddlePaddle/Paddle/pull/72059), [#72283](https://github.com/PaddlePaddle/Paddle/pull/72283), [#72451](https://github.com/PaddlePaddle/Paddle/pull/72451), [#72512](https://github.com/PaddlePaddle/Paddle/pull/72512), [#72618](https://github.com/PaddlePaddle/Paddle/pull/72618), [#72976](https://github.com/PaddlePaddle/Paddle/pull/72976), [#73084](https://github.com/PaddlePaddle/Paddle/pull/73084), [#73205](https://github.com/PaddlePaddle/Paddle/pull/73205), [#73250](https://github.com/PaddlePaddle/Paddle/pull/73250), [#73111](https://github.com/PaddlePaddle/Paddle/pull/73111), [#73260](https://github.com/PaddlePaddle/Paddle/pull/73260), [#72094](https://github.com/PaddlePaddle/Paddle/pull/72094), [#71844](https://github.com/PaddlePaddle/Paddle/pull/71844), [#71357](https://github.com/PaddlePaddle/Paddle/pull/71357)
+- Memory management and bug fixes: Address high-risk issues such as memory overflow (set_value/nonzero), null pointer (data nullptr), and CUDA graph allocation failure. Fix memory leaks and computational errors in core operations such as gradient clipping (clip_grad), tensor assignment (assign), and broadcasting (broadcast). Optimize NPU asynchronous execution and predictor GIL release logic to enhance system robustness. [#71895](https://github.com/PaddlePaddle/Paddle/pull/71895), [#72101](https://github.com/PaddlePaddle/Paddle/pull/72101), [#72133](https://github.com/PaddlePaddle/Paddle/pull/72133), [#72149](https://github.com/PaddlePaddle/Paddle/pull/72149), [#72176](https://github.com/PaddlePaddle/Paddle/pull/72176), [#72314](https://github.com/PaddlePaddle/Paddle/pull/72314), [#72256](https://github.com/PaddlePaddle/Paddle/pull/72256), [#72757](https://github.com/PaddlePaddle/Paddle/pull/72757), [#72749](https://github.com/PaddlePaddle/Paddle/pull/72749), [#72792](https://github.com/PaddlePaddle/Paddle/pull/72792), [#72815](https://github.com/PaddlePaddle/Paddle/pull/72815), [#72819](https://github.com/PaddlePaddle/Paddle/pull/72819), [#72958](https://github.com/PaddlePaddle/Paddle/pull/72958), [#73023](https://github.com/PaddlePaddle/Paddle/pull/73023), [#73103](https://github.com/PaddlePaddle/Paddle/pull/73103), [#73014](https://github.com/PaddlePaddle/Paddle/pull/73014), [#73137](https://github.com/PaddlePaddle/Paddle/pull/73137), [#73256](https://github.com/PaddlePaddle/Paddle/pull/73256), [#73211](https://github.com/PaddlePaddle/Paddle/pull/73211), [#73251](https://github.com/PaddlePaddle/Paddle/pull/73251), [#73210](https://github.com/PaddlePaddle/Paddle/pull/73210), [#73415](https://github.com/PaddlePaddle/Paddle/pull/73415), [#73206](https://github.com/PaddlePaddle/Paddle/pull/73206), [#71983](https://github.com/PaddlePaddle/Paddle/pull/71983), [#72485](https://github.com/PaddlePaddle/Paddle/pull/72485), [#72561](https://github.com/PaddlePaddle/Paddle/pull/72561)
+- Other important fixes: Fixed defects in scientific computation, save/load modules, improved Slice operator kernel configuration, optimized fallback strategy for dynamic shape inference, and refined exception throwing and type checking logic. [#71810](https://github.com/PaddlePaddle/Paddle/pull/71810), [#72246](https://github.com/PaddlePaddle/Paddle/pull/72246), [#72378](https://github.com/PaddlePaddle/Paddle/pull/72378), [#72467](https://github.com/PaddlePaddle/Paddle/pull/72467), [#72635](https://github.com/PaddlePaddle/Paddle/pull/72635), [#72751](https://github.com/PaddlePaddle/Paddle/pull/72751), [#72044](https://github.com/PaddlePaddle/Paddle/pull/72044), [#72051](https://github.com/PaddlePaddle/Paddle/pull/72051), [#73231](https://github.com/PaddlePaddle/Paddle/pull/73231), [#73109](https://github.com/PaddlePaddle/Paddle/pull/73109)
+- Fixed issues related to SOT, [#71932](https://github.com/PaddlePaddle/Paddle/pull/71932), [#71971](https://github.com/PaddlePaddle/Paddle/pull/71971), [#72194](https://github.com/PaddlePaddle/Paddle/pull/72194), [#72288](https://github.com/PaddlePaddle/Paddle/pull/72288), [#72306](https://github.com/PaddlePaddle/Paddle/pull/72306), [#72367](https://github.com/PaddlePaddle/Paddle/pull/72367), [#72495](https://github.com/PaddlePaddle/Paddle/pull/72495), [#72522](https://github.com/PaddlePaddle/Paddle/pull/72522), [#72704](https://github.com/PaddlePaddle/Paddle/pull/72704), [#72631](https://github.com/PaddlePaddle/Paddle/pull/72631), [#72737](https://github.com/PaddlePaddle/Paddle/pull/72737), [#73067](https://github.com/PaddlePaddle/Paddle/pull/73067), [#73030](https://github.com/PaddlePaddle/Paddle/pull/73030), [#73059](https://github.com/PaddlePaddle/Paddle/pull/73059), [#73282](https://github.com/PaddlePaddle/Paddle/pull/73282), [#73511](https://github.com/PaddlePaddle/Paddle/pull/73511), [#73526](https://github.com/PaddlePaddle/Paddle/pull/73526), [#73549](https://github.com/PaddlePaddle/Paddle/pull/73549), [#73515](https://github.com/PaddlePaddle/Paddle/pull/73515)
-## 4. Automatic parallel architecture
-
-In the official 3.0 version, we have conducted in-depth verification and refinement of the automatic parallel architecture to better support the pre-training + fine-tuning process for common large model scenarios such as pure text dense models, pure text sparse models (MoE), and multi-modal understanding models. Specifically, we have added segmentation derivation rules for over 20 operators tailored for these scenarios, and support the conversion of automatic parallel training parameters into manual parallel parameters for downstream inference, making automatic parallelism fully usable and helping users reduce the development cost of large model parallel programs. Additionally, to further simplify the distributed development process for users, we have introduced a new `paddle.distributed.parallel` interface. Based on the encapsulation of distributed tensor notation syntax, it supports users in non-intrusively configuring common parallel strategies such as data parallelism, model parallelism, and pipeline parallelism outside of model networking. Furthermore, the static graph automatic parallel architecture has undergone a comprehensive upgrade based on PIR, with the underlying basic components, core modules, parallel strategies, and performance optimization strategies all implemented uniformly based on the extended PIR `DistDialect`. This has further enhanced the dynamic and static consistency of automatic parallelism, achieving performance levels on the Llama series models that are on par with or even surpass manual parallelism.
-
-### New Features
+### Improvements
-- Added the `paddle.distributed.parallel` interface to support configuring common parallel strategies outside of model networking, simplifying the distributed development process. [#69004](https://github.com/PaddlePaddle/Paddle/pull/69004), [#69033](https://github.com/PaddlePaddle/Paddle/pull/69033), [#69077](https://github.com/PaddlePaddle/Paddle/pull/69077), [#69136](https://github.com/PaddlePaddle/Paddle/pull/69136), [#69169](https://github.com/PaddlePaddle/Paddle/pull/69169), [#69212](https://github.com/PaddlePaddle/Paddle/pull/69212), [#69217](https://github.com/PaddlePaddle/Paddle/pull/69217), [#69283](https://github.com/PaddlePaddle/Paddle/pull/69283), [#69288](https://github.com/PaddlePaddle/Paddle/pull/69288), [#69326](https://github.com/PaddlePaddle/Paddle/pull/69326), [#69365](https://github.com/PaddlePaddle/Paddle/pull/69365), [#69384](https://github.com/PaddlePaddle/Paddle/pull/69384), [#69426](https://github.com/PaddlePaddle/Paddle/pull/69426), [#69443](https://github.com/PaddlePaddle/Paddle/pull/69443), [#69462](https://github.com/PaddlePaddle/Paddle/pull/69462), [#69492](https://github.com/PaddlePaddle/Paddle/pull/69492), [#69628](https://github.com/PaddlePaddle/Paddle/pull/69628), [#69677](https://github.com/PaddlePaddle/Paddle/pull/69677), [#69697](https://github.com/PaddlePaddle/Paddle/pull/69697), [#69776](https://github.com/PaddlePaddle/Paddle/pull/69776), [#69896](https://github.com/PaddlePaddle/Paddle/pull/69896), [#70138](https://github.com/PaddlePaddle/Paddle/pull/70138), [#70182](https://github.com/PaddlePaddle/Paddle/pull/70182), [#70539](https://github.com/PaddlePaddle/Paddle/pull/70539), [#71116](https://github.com/PaddlePaddle/Paddle/pull/71116), [#71210](https://github.com/PaddlePaddle/Paddle/pull/71210)
-- For pure text sparse scenarios, it supports MoE expert parallelism, implements an expert parallelism to mesh partitioning conversion mechanism, and supports automatic invocation of all2all communication. [#66462](https://github.com/PaddlePaddle/Paddle/pull/66462), [#66750](https://github.com/PaddlePaddle/Paddle/pull/66750), [#68004](https://github.com/PaddlePaddle/Paddle/pull/68004), [#68053](https://github.com/PaddlePaddle/Paddle/pull/68053), [#68187](https://github.com/PaddlePaddle/Paddle/pull/68187), [#68477](https://github.com/PaddlePaddle/Paddle/pull/68477), [#69098](https://github.com/PaddlePaddle/Paddle/pull/69098), [#69262](https://github.com/PaddlePaddle/Paddle/pull/69262), [#69296](https://github.com/PaddlePaddle/Paddle/pull/69296), [#70715](https://github.com/PaddlePaddle/Paddle/pull/70715), [#71292](https://github.com/PaddlePaddle/Paddle/pull/71292), [#71320](https://github.com/PaddlePaddle/Paddle/pull/71320)
-- To meet the needs of users in extreme manual optimization scenarios for managing segmentation status and communication operations, and to address the issue of being unable to use tensor segmentation syntax in some non-SPMD scenarios, we have added the `LocalLayer` interface to support a hybrid network of automatic and manual parallelism. [#70519](https://github.com/PaddlePaddle/Paddle/pull/70519), [#70525](https://github.com/PaddlePaddle/Paddle/pull/70525), [#70600](https://github.com/PaddlePaddle/Paddle/pull/70600), [#71232](https://github.com/PaddlePaddle/Paddle/pull/71232), [#71264](https://github.com/PaddlePaddle/Paddle/pull/71264), [#71373](https://github.com/PaddlePaddle/Paddle/pull/71373)
-- To enable users to run automatic parallel programs using domestic hardware, we have completed the adaptation for Kunlun chips, and support for other chips is also underway. [#70997](https://github.com/PaddlePaddle/Paddle/pull/70997), [#71126](https://github.com/PaddlePaddle/Paddle/pull/71126), [#71229](https://github.com/PaddlePaddle/Paddle/pull/71229), [#71289](https://github.com/PaddlePaddle/Paddle/pull/71289), [#71425](https://github.com/PaddlePaddle/Paddle/pull/71425), [#71500](https://github.com/PaddlePaddle/Paddle/pull/71500)
-- For situations where the data dimension cannot be divided evenly by the device dimension, non-balanced splitting derivation and splitting transformation are supported. [#66103](https://github.com/PaddlePaddle/Paddle/pull/66103), [#67756](https://github.com/PaddlePaddle/Paddle/pull/67756), [#69265](https://github.com/PaddlePaddle/Paddle/pull/69265), [#70072](https://github.com/PaddlePaddle/Paddle/pull/70072)
-- The shard_dataloader function has been upgraded to support setting the gradient accumulation step count through `batch_sampler`, and also supports scenarios with multiple model inputs. [#65325](https://github.com/PaddlePaddle/Paddle/pull/65325), [#70659](https://github.com/PaddlePaddle/Paddle/pull/70659)
-- Upgrades have been made to the parameter saving and loading functions, supporting asynchronous storage of parameters, mutual loading of `master_weight` between dynamic and static graphs, as well as parameter version control and offload functions. [#66858](https://github.com/PaddlePaddle/Paddle/pull/66858), [#67427](https://github.com/PaddlePaddle/Paddle/pull/67427), [#70105](https://github.com/PaddlePaddle/Paddle/pull/70105), [#70639](https://github.com/PaddlePaddle/Paddle/pull/70639)
-- To meet users' needs for converting dynamic networking involving `PyLayer` to static, support has been added for `PyLayer` in static graph mode, allowing distributed tensors to be run within `PyLayer`. [#67326](https://github.com/PaddlePaddle/Paddle/pull/67326), [#68190](https://github.com/PaddlePaddle/Paddle/pull/68190), [#69089](https://github.com/PaddlePaddle/Paddle/pull/69089), [#70831](https://github.com/PaddlePaddle/Paddle/pull/70831)
-- To address the issue of incorrect dynamic-to-static conversion caused by inconsistency between the data stream input format and the `input_spec` actually required by the model for dynamic-to-static conversion, the dynamic-to-static conversion interface supports a user-defined `input_spec` feature, allowing users to input the required `input_spec` on their own. [#69183](https://github.com/PaddlePaddle/Paddle/pull/69183)
-- For hybrid parallel scenarios, the gradient clipping strategy has been adapted and supported. [#65259](https://github.com/PaddlePaddle/Paddle/pull/65259), [#65928](https://github.com/PaddlePaddle/Paddle/pull/65928), [#69287](https://github.com/PaddlePaddle/Paddle/pull/69287), [#69760](https://github.com/PaddlePaddle/Paddle/pull/69760), [#71421](https://github.com/PaddlePaddle/Paddle/pull/71421)
-- For scenarios where the number of model layers is not divisible by the number of devices, a non-balanced pipeline parallel strategy is supported, allowing users to split different numbers of network layers at different pipeline stages. [#69728](https://github.com/PaddlePaddle/Paddle/pull/69728), [#70164](https://github.com/PaddlePaddle/Paddle/pull/70164), [#70230](https://github.com/PaddlePaddle/Paddle/pull/70230)
-- Added `set_mesh` and `get_mesh` interfaces to enable users to easily set and retrieve the global mesh. [#69999](https://github.com/PaddlePaddle/Paddle/pull/69999)
-- Added automatic and manual parallelism accuracy alignment switches to facilitate the conversion of existing manual parallelism models to automatic parallelism and verify the accuracy of the results. [#67681](https://github.com/PaddlePaddle/Paddle/pull/67681)
-
-### Functional improvements
-
-Improve and optimize the derivation rules for operator slicing
-
-- Added derivation rules for operators `add_n`, `split`, and `softmax_grad`. [#65606](https://github.com/PaddlePaddle/Paddle/pull/65606), [#69439](https://github.com/PaddlePaddle/Paddle/pull/69439)
-- Added operator splitting derivation rules for `assign` and `embedding_grad`. [#67457](https://github.com/PaddlePaddle/Paddle/pull/67457)
-- Added `clip` operator slicing derivation rule. [#70632](https://github.com/PaddlePaddle/Paddle/pull/70632)
-- Added derivation rules for the `dist_stack` and `gather_nd` operators. [#65426](https://github.com/PaddlePaddle/Paddle/pull/65426)
-- Added the derivation rule for `dropout` operator segmentation. [#70216](https://github.com/PaddlePaddle/Paddle/pull/70216)
-- Added slicing derivation rule for `fused_dropout_add` operator. [#67722](https://github.com/PaddlePaddle/Paddle/pull/67722)
-- Added `fast_ln` custom operator segmentation derivation rule. [#68148](https://github.com/PaddlePaddle/Paddle/pull/68148)
-- Added `greater_equal` and `less_equal` operator slicing derivation rules. [#68868](https://github.com/PaddlePaddle/Paddle/pull/68868)
-- Added `greater_than` and `less_than` operator slicing derivation rules. [#68133](https://github.com/PaddlePaddle/Paddle/pull/68133)
-- Added `if` operator segmentation derivation rule. [#69357](https://github.com/PaddlePaddle/Paddle/pull/69357)
-- Added slicing derivation rules for operators `logical_and`, `logical_not`, `logical_or`, and `logical_xor`. [#67840](https://github.com/PaddlePaddle/Paddle/pull/67840)
-- Added `logsumexp` operator slicing derivation rule. [#67840](https://github.com/PaddlePaddle/Paddle/pull/67840)
-- Added `non_zero` operator slicing derivation rule. [#67996](https://github.com/PaddlePaddle/Paddle/pull/67996)
-- Added `pad` operator slicing derivation rule. [#68304](https://github.com/PaddlePaddle/Paddle/pull/68304)
-- Added the derivation rule for operator segmentation of `p_norm`. [#68317](https://github.com/PaddlePaddle/Paddle/pull/68317)
-- Added the derivation rule for the `scatter_nd` operator's slicing. [#67980](https://github.com/PaddlePaddle/Paddle/pull/67980)
-- Added `sigmoid` operator segmentation derivation rule. [#71092](https://github.com/PaddlePaddle/Paddle/pull/71092)
-
-Static graph automatic parallel architecture based on PIR upgrade
-
-- Upgrades to Automatic Mixed Precision (AMP) training. [#65089](https://github.com/PaddlePaddle/Paddle/pull/65089), [#65892](https://github.com/PaddlePaddle/Paddle/pull/65892), [#66418](https://github.com/PaddlePaddle/Paddle/pull/66418), [#66674](https://github.com/PaddlePaddle/Paddle/pull/66674), [#68545](https://github.com/PaddlePaddle/Paddle/pull/68545)
-- Upgrade of recalculation strategy. [#69681](https://github.com/PaddlePaddle/Paddle/pull/69681), [#70064](https://github.com/PaddlePaddle/Paddle/pull/70064)
-- Upgrades to the parameter slicing parallel strategy. [#63542](https://github.com/PaddlePaddle/Paddle/pull/63542), [#67748](https://github.com/PaddlePaddle/Paddle/pull/67748), [#68288](https://github.com/PaddlePaddle/Paddle/pull/68288), [#68314](https://github.com/PaddlePaddle/Paddle/pull/68314), [#69059](https://github.com/PaddlePaddle/Paddle/pull/69059), [#71167](https://github.com/PaddlePaddle/Paddle/pull/71167)
-- Upgrading the pipeline parallelism strategy. [#66810](https://github.com/PaddlePaddle/Paddle/pull/66810), [#67174](https://github.com/PaddlePaddle/Paddle/pull/67174), [#67522](https://github.com/PaddlePaddle/Paddle/pull/67522), [#68141](https://github.com/PaddlePaddle/Paddle/pull/68141), [#68742](https://github.com/PaddlePaddle/Paddle/pull/68742), [#68962](https://github.com/PaddlePaddle/Paddle/pull/68962), [#69052](https://github.com/PaddlePaddle/Paddle/pull/69052), [#69201](https://github.com/PaddlePaddle/Paddle/pull/69201), [#69244](https://github.com/PaddlePaddle/Paddle/pull/69244), [#69578](https://github.com/PaddlePaddle/Paddle/pull/69578), [#69584](https://github.com/PaddlePaddle/Paddle/pull/69584), [#69654](https://github.com/PaddlePaddle/Paddle/pull/69654), [#69799](https://github.com/PaddlePaddle/Paddle/pull/69799), [#69894](https://github.com/PaddlePaddle/Paddle/pull/69894), [#70360](https://github.com/PaddlePaddle/Paddle/pull/70360), [#70615](https://github.com/PaddlePaddle/Paddle/pull/70615)
-- Gradient accumulation strategy upgrade. [#66641](https://github.com/PaddlePaddle/Paddle/pull/66641), [#67254](https://github.com/PaddlePaddle/Paddle/pull/67254), [#67907](https://github.com/PaddlePaddle/Paddle/pull/67907), [#68391](https://github.com/PaddlePaddle/Paddle/pull/68391), [#68460](https://github.com/PaddlePaddle/Paddle/pull/68460), [#68472](https://github.com/PaddlePaddle/Paddle/pull/68472), [#68664](https://github.com/PaddlePaddle/Paddle/pull/68664), [#68727](https://github.com/PaddlePaddle/Paddle/pull/68727), [#69171](https://github.com/PaddlePaddle/Paddle/pull/69171), [#69805](https://github.com/PaddlePaddle/Paddle/pull/69805)
-- Operator fusion strategy upgrade. [#68087](https://github.com/PaddlePaddle/Paddle/pull/68087), [#68207](https://github.com/PaddlePaddle/Paddle/pull/68207), [#68383](https://github.com/PaddlePaddle/Paddle/pull/68383), [#68623](https://github.com/PaddlePaddle/Paddle/pull/68623), [#68650](https://github.com/PaddlePaddle/Paddle/pull/68650), [#68736](https://github.com/PaddlePaddle/Paddle/pull/68736), [#69103](https://github.com/PaddlePaddle/Paddle/pull/69103), [#70889](https://github.com/PaddlePaddle/Paddle/pull/70889)
-- The `tensor_fusion` optimization strategy has been upgraded. [#66130](https://github.com/PaddlePaddle/Paddle/pull/66130), [#68475](https://github.com/PaddlePaddle/Paddle/pull/68475), [#69243](https://github.com/PaddlePaddle/Paddle/pull/69243), [#69560](https://github.com/PaddlePaddle/Paddle/pull/69560), [#69823](https://github.com/PaddlePaddle/Paddle/pull/69823), [#70195](https://github.com/PaddlePaddle/Paddle/pull/70195), [#70309](https://github.com/PaddlePaddle/Paddle/pull/70309), [#70363](https://github.com/PaddlePaddle/Paddle/pull/70363), [#70869](https://github.com/PaddlePaddle/Paddle/pull/70869)
-- Tensor parallel optimization strategy upgrade. [#68182](https://github.com/PaddlePaddle/Paddle/pull/68182), [#68389](https://github.com/PaddlePaddle/Paddle/pull/68389)
-- Upgrade of custom operator segmentation derivation mechanism. [#67614](https://github.com/PaddlePaddle/Paddle/pull/67614)
-- Upgrades to the parameter saving and loading mechanism. [#66416](https://github.com/PaddlePaddle/Paddle/pull/66416), [#67045](https://github.com/PaddlePaddle/Paddle/pull/67045), [#67369](https://github.com/PaddlePaddle/Paddle/pull/67369), [#68203](https://github.com/PaddlePaddle/Paddle/pull/68203)
-- Optimize computation graph compilation time. [#68796](https://github.com/PaddlePaddle/Paddle/pull/68796)
+- Development of the 0-size mechanism for Paddle API. [#72721](https://github.com/PaddlePaddle/Paddle/pull/72721), [#72756](https://github.com/PaddlePaddle/Paddle/pull/72756), [#72790](https://github.com/PaddlePaddle/Paddle/pull/72790), [#72806](https://github.com/PaddlePaddle/Paddle/pull/72806), [#72764](https://github.com/PaddlePaddle/Paddle/pull/72764), [#72786](https://github.com/PaddlePaddle/Paddle/pull/72786), [#72853](https://github.com/PaddlePaddle/Paddle/pull/72853), [#72826](https://github.com/PaddlePaddle/Paddle/pull/72826), [#72851](https://github.com/PaddlePaddle/Paddle/pull/72851), [#72928](https://github.com/PaddlePaddle/Paddle/pull/72928), [#72912](https://github.com/PaddlePaddle/Paddle/pull/72912), [#72922](https://github.com/PaddlePaddle/Paddle/pull/72922), [#72924](https://github.com/PaddlePaddle/Paddle/pull/72924), [#72887](https://github.com/PaddlePaddle/Paddle/pull/72887), [#72921](https://github.com/PaddlePaddle/Paddle/pull/72921), [#72906](https://github.com/PaddlePaddle/Paddle/pull/72906), [#72895](https://github.com/PaddlePaddle/Paddle/pull/72895), [#72821](https://github.com/PaddlePaddle/Paddle/pull/72821), [#72914](https://github.com/PaddlePaddle/Paddle/pull/72914), [#72936](https://github.com/PaddlePaddle/Paddle/pull/72936), [#72943](https://github.com/PaddlePaddle/Paddle/pull/72943), [#72694](https://github.com/PaddlePaddle/Paddle/pull/72694), [#72919](https://github.com/PaddlePaddle/Paddle/pull/72919), [#72940](https://github.com/PaddlePaddle/Paddle/pull/72940), [#72820](https://github.com/PaddlePaddle/Paddle/pull/72820), [#72934](https://github.com/PaddlePaddle/Paddle/pull/72934), [#72975](https://github.com/PaddlePaddle/Paddle/pull/72975), [#72872](https://github.com/PaddlePaddle/Paddle/pull/72872), [#72984](https://github.com/PaddlePaddle/Paddle/pull/72984), [#72988](https://github.com/PaddlePaddle/Paddle/pull/72988), [#72972](https://github.com/PaddlePaddle/Paddle/pull/72972), [#72977](https://github.com/PaddlePaddle/Paddle/pull/72977), [#72937](https://github.com/PaddlePaddle/Paddle/pull/72937), [#73086](https://github.com/PaddlePaddle/Paddle/pull/73086), [#73042](https://github.com/PaddlePaddle/Paddle/pull/73042), [#73017](https://github.com/PaddlePaddle/Paddle/pull/73017), [#73044](https://github.com/PaddlePaddle/Paddle/pull/73044), [#73077](https://github.com/PaddlePaddle/Paddle/pull/73077), [#73108](https://github.com/PaddlePaddle/Paddle/pull/73108), [#73027](https://github.com/PaddlePaddle/Paddle/pull/73027), [#72970](https://github.com/PaddlePaddle/Paddle/pull/72970), [#73008](https://github.com/PaddlePaddle/Paddle/pull/73008), [#72996](https://github.com/PaddlePaddle/Paddle/pull/72996), [#73165](https://github.com/PaddlePaddle/Paddle/pull/73165), [#73166](https://github.com/PaddlePaddle/Paddle/pull/73166), [#73170](https://github.com/PaddlePaddle/Paddle/pull/73170), [#73122](https://github.com/PaddlePaddle/Paddle/pull/73122), [#73204](https://github.com/PaddlePaddle/Paddle/pull/73204), [#73207](https://github.com/PaddlePaddle/Paddle/pull/73207), [#73186](https://github.com/PaddlePaddle/Paddle/pull/73186), [#73197](https://github.com/PaddlePaddle/Paddle/pull/73197), [#73168](https://github.com/PaddlePaddle/Paddle/pull/73168), [#73172](https://github.com/PaddlePaddle/Paddle/pull/73172), [#73125](https://github.com/PaddlePaddle/Paddle/pull/73125), [#73181](https://github.com/PaddlePaddle/Paddle/pull/73181), [#73270](https://github.com/PaddlePaddle/Paddle/pull/73270), [#73028](https://github.com/PaddlePaddle/Paddle/pull/73028), [#73094](https://github.com/PaddlePaddle/Paddle/pull/73094), [#73180](https://github.com/PaddlePaddle/Paddle/pull/73180), [#73276](https://github.com/PaddlePaddle/Paddle/pull/73276), [#73333](https://github.com/PaddlePaddle/Paddle/pull/73333), [#73341](https://github.com/PaddlePaddle/Paddle/pull/73341), [#73299](https://github.com/PaddlePaddle/Paddle/pull/73299), [#73346](https://github.com/PaddlePaddle/Paddle/pull/73346), [#73361](https://github.com/PaddlePaddle/Paddle/pull/73361), [#73375](https://github.com/PaddlePaddle/Paddle/pull/73375), [#73152](https://github.com/PaddlePaddle/Paddle/pull/73152), [#73377](https://github.com/PaddlePaddle/Paddle/pull/73377), [#73355](https://github.com/PaddlePaddle/Paddle/pull/73355), [#73382](https://github.com/PaddlePaddle/Paddle/pull/73382), [#73385](https://github.com/PaddlePaddle/Paddle/pull/73385), [#73386](https://github.com/PaddlePaddle/Paddle/pull/73386), [#73352](https://github.com/PaddlePaddle/Paddle/pull/73352), [#73387](https://github.com/PaddlePaddle/Paddle/pull/73387), [#73401](https://github.com/PaddlePaddle/Paddle/pull/73401), [#73384](https://github.com/PaddlePaddle/Paddle/pull/73384), [#73450](https://github.com/PaddlePaddle/Paddle/pull/73450), [#73437](https://github.com/PaddlePaddle/Paddle/pull/73437), [#73503](https://github.com/PaddlePaddle/Paddle/pull/73503), [#73507](https://github.com/PaddlePaddle/Paddle/pull/73507), [#73477](https://github.com/PaddlePaddle/Paddle/pull/73477), [#73513](https://github.com/PaddlePaddle/Paddle/pull/73513), [#73525](https://github.com/PaddlePaddle/Paddle/pull/73525), [#73528](https://github.com/PaddlePaddle/Paddle/pull/73528), [#73517](https://github.com/PaddlePaddle/Paddle/pull/73517), [#72898](https://github.com/PaddlePaddle/Paddle/pull/72898), [#72880](https://github.com/PaddlePaddle/Paddle/pull/72880), [#72864](https://github.com/PaddlePaddle/Paddle/pull/72864), [#72993](https://github.com/PaddlePaddle/Paddle/pull/72993), [#72954](https://github.com/PaddlePaddle/Paddle/pull/72954), [#72866](https://github.com/PaddlePaddle/Paddle/pull/72866), [#72878](https://github.com/PaddlePaddle/Paddle/pull/72878), [#72889](https://github.com/PaddlePaddle/Paddle/pull/72889), [#72861](https://github.com/PaddlePaddle/Paddle/pull/72861), [#72837](https://github.com/PaddlePaddle/Paddle/pull/72837)
+- SOT-related enhancements: Enhanced functionalities (such as NumPy interoperability and super support), improved training stability, and fixed multiple issues to enhance code robustness, [#71763](https://github.com/PaddlePaddle/Paddle/pull/71763), [#71666](https://github.com/PaddlePaddle/Paddle/pull/71666), [#71858](https://github.com/PaddlePaddle/Paddle/pull/71858), [#71865](https://github.com/PaddlePaddle/Paddle/pull/71865), [#72474](https://github.com/PaddlePaddle/Paddle/pull/72474), [#72154](https://github.com/PaddlePaddle/Paddle/pull/72154), [#72784](https://github.com/PaddlePaddle/Paddle/pull/72784), [#72956](https://github.com/PaddlePaddle/Paddle/pull/72956), [#73038](https://github.com/PaddlePaddle/Paddle/pull/73038), [#73066](https://github.com/PaddlePaddle/Paddle/pull/73066), [#73287](https://github.com/PaddlePaddle/Paddle/pull/73287), [#73278](https://github.com/PaddlePaddle/Paddle/pull/73278), [#73332](https://github.com/PaddlePaddle/Paddle/pull/73332), [#73372](https://github.com/PaddlePaddle/Paddle/pull/73372), [#73412](https://github.com/PaddlePaddle/Paddle/pull/73412), [#73407](https://github.com/PaddlePaddle/Paddle/pull/73407), [#73506](https://github.com/PaddlePaddle/Paddle/pull/73506)
+- Code style refactoring: Through code refactoring and the unification of cross-platform kernel behaviors, we have improved code quality and maintainability. Additionally, we have added a YAML format pre-commit check tool, as documented in [#72216](https://github.com/PaddlePaddle/Paddle/pull/72216), [#72360](https://github.com/PaddlePaddle/Paddle/pull/72360), [#72816](https://github.com/PaddlePaddle/Paddle/pull/72816), [#72969](https://github.com/PaddlePaddle/Paddle/pull/72969), [#73106](https://github.com/PaddlePaddle/Paddle/pull/73106), [#72825](https://github.com/PaddlePaddle/Paddle/pull/72825), [#73150](https://github.com/PaddlePaddle/Paddle/pull/73150), [#73151](https://github.com/PaddlePaddle/Paddle/pull/73151), [#73158](https://github.com/PaddlePaddle/Paddle/pull/73158), [#73101](https://github.com/PaddlePaddle/Paddle/pull/73101), [#73326](https://github.com/PaddlePaddle/Paddle/pull/73326), [#72580](https://github.com/PaddlePaddle/Paddle/pull/72580), and [#72424](https://github.com/PaddlePaddle/Paddle/pull/72424)
+- Paddle CPU/GPU Kernel accuracy issue is pushed to the whole team. [#72879](https://github.com/PaddlePaddle/Paddle/pull/72879), [#72894](https://github.com/PaddlePaddle/Paddle/pull/72894), [#73012](https://github.com/PaddlePaddle/Paddle/pull/73012), [#72973](https://github.com/PaddlePaddle/Paddle/pull/72973), [#73018](https://github.com/PaddlePaddle/Paddle/pull/73018), [#72965](https://github.com/PaddlePaddle/Paddle/pull/72965), [#73128](https://github.com/PaddlePaddle/Paddle/pull/73128), [#73229](https://github.com/PaddlePaddle/Paddle/pull/73229), [#72992](https://github.com/PaddlePaddle/Paddle/pull/72992), [#73344](https://github.com/PaddlePaddle/Paddle/pull/73344), [#73274](https://github.com/PaddlePaddle/Paddle/pull/73274), [#73295](https://github.com/PaddlePaddle/Paddle/pull/73295), [#73293](https://github.com/PaddlePaddle/Paddle/pull/73293), [#73317](https://github.com/PaddlePaddle/Paddle/pull/73317), [#73320](https://github.com/PaddlePaddle/Paddle/pull/73320), [#73454](https://github.com/PaddlePaddle/Paddle/pull/73454), [#73492](https://github.com/PaddlePaddle/Paddle/pull/73492), [#73535](https://github.com/PaddlePaddle/Paddle/pull/73535)
+- Slice issue fixes: Fixed issues related to slices, including indexing logic, performance optimization, etc., [#72644](https://github.com/PaddlePaddle/Paddle/pull/72644), [#72676](https://github.com/PaddlePaddle/Paddle/pull/72676), [#72838](https://github.com/PaddlePaddle/Paddle/pull/72838), [#72966](https://github.com/PaddlePaddle/Paddle/pull/72966), [#73095](https://github.com/PaddlePaddle/Paddle/pull/73095), [#72840](https://github.com/PaddlePaddle/Paddle/pull/72840), [#73112](https://github.com/PaddlePaddle/Paddle/pull/73112), [#73367](https://github.com/PaddlePaddle/Paddle/pull/73367), [#73390](https://github.com/PaddlePaddle/Paddle/pull/73390), [#73307](https://github.com/PaddlePaddle/Paddle/pull/73307), [#73465](https://github.com/PaddlePaddle/Paddle/pull/73465), [#73362](https://github.com/PaddlePaddle/Paddle/pull/73362), [#72733](https://github.com/PaddlePaddle/Paddle/pull/72733), [#72886](https://github.com/PaddlePaddle/Paddle/pull/72886)
+- Performance optimization: By optimizing index logic and enhancing performance, we aim to improve overall performance, [#72707](https://github.com/PaddlePaddle/Paddle/pull/72707), [#73485](https://github.com/PaddlePaddle/Paddle/pull/73485)
+- Other significant improvements: including dynamic shape support, fixing meshgrid and adding unit tests, upgrading CUB to version 2.1.0, improving FP8 numerical processing, optimizing the CUDA graph shared pool mechanism, removing ShadowFeedOp to simplify data flow, enhancing version compatibility for PIR model saving/loading, fixing flip and reverse kernel issues, improving the NaN propagation logic of paddle.angle, introducing an asynchronous GC check mechanism, optimizing the Scope lock-free interface of Dy2St, cleaning up unused third-party dependencies (absl), and further promoting the decoupling of PHI and Fluid to enhance the framework's stability, performance, and scalability. [#72356](https://github.com/PaddlePaddle/Paddle/pull/72356), [#72380](https://github.com/PaddlePaddle/Paddle/pull/72380), [#72633](https://github.com/PaddlePaddle/Paddle/pull/72633), [#72794](https://github.com/PaddlePaddle/Paddle/pull/72794), [#72917](https://github.com/PaddlePaddle/Paddle/pull/72917), [#72920](https://github.com/PaddlePaddle/Paddle/pull/72920), [#72945](https://github.com/PaddlePaddle/Paddle/pull/72945), [#72620](https://github.com/PaddlePaddle/Paddle/pull/72620), [#73011](https://github.com/PaddlePaddle/Paddle/pull/73011), [#73051](https://github.com/PaddlePaddle/Paddle/pull/73051), [#73052](https://github.com/PaddlePaddle/Paddle/pull/73052), [#73075](https://github.com/PaddlePaddle/Paddle/pull/73075), [#73176](https://github.com/PaddlePaddle/Paddle/pull/73176), [#73191](https://github.com/PaddlePaddle/Paddle/pull/73191), [#73337](https://github.com/PaddlePaddle/Paddle/pull/73337), [#73311](https://github.com/PaddlePaddle/Paddle/pull/73311), [#73173](https://github.com/PaddlePaddle/Paddle/pull/73173), [#73239](https://github.com/PaddlePaddle/Paddle/pull/73239), [#73448](https://github.com/PaddlePaddle/Paddle/pull/73448), [#73478](https://github.com/PaddlePaddle/Paddle/pull/73478), [#73522](https://github.com/PaddlePaddle/Paddle/pull/73522), [#73369](https://github.com/PaddlePaddle/Paddle/pull/73369)
-### Bug fixes
+### Performance
-- Fixed bugs in the segmentation derivation mechanism and the segmentation derivation rules for several operators. [#65702](https://github.com/PaddlePaddle/Paddle/pull/65702), [#65835](https://github.com/PaddlePaddle/Paddle/pull/65835), [#66098](https://github.com/PaddlePaddle/Paddle/pull/66098), [#66955](https://github.com/PaddlePaddle/Paddle/pull/66955), [#67052](https://github.com/PaddlePaddle/Paddle/pull/67052), [#67059](https://github.com/PaddlePaddle/Paddle/pull/67059), [#67101](https://github.com/PaddlePaddle/Paddle/pull/67101), [#67283](https://github.com/PaddlePaddle/Paddle/pull/67283), [#67729](https://github.com/PaddlePaddle/Paddle/pull/67729), [#67996](https://github.com/PaddlePaddle/Paddle/pull/67996), [#68413](https://github.com/PaddlePaddle/Paddle/pull/68413), [#68455](https://github.com/PaddlePaddle/Paddle/pull/68455), [#68533](https://github.com/PaddlePaddle/Paddle/pull/68533), [#68976](https://github.com/PaddlePaddle/Paddle/pull/68976), [#68977](https://github.com/PaddlePaddle/Paddle/pull/68977), [#69027](https://github.com/PaddlePaddle/Paddle/pull/69027), [#69203](https://github.com/PaddlePaddle/Paddle/pull/69203), [#69223](https://github.com/PaddlePaddle/Paddle/pull/69223), [#69862](https://github.com/PaddlePaddle/Paddle/pull/69862), [#69991](https://github.com/PaddlePaddle/Paddle/pull/69991), [#70100](https://github.com/PaddlePaddle/Paddle/pull/70100), [#70624](https://github.com/PaddlePaddle/Paddle/pull/70624), [#71024](https://github.com/PaddlePaddle/Paddle/pull/71024), [#71152](https://github.com/PaddlePaddle/Paddle/pull/71152), [#71214](https://github.com/PaddlePaddle/Paddle/pull/71214), [#71253](https://github.com/PaddlePaddle/Paddle/pull/71253), [#71388](https://github.com/PaddlePaddle/Paddle/pull/71388)
-- Fixed several bugs in the segmentation conversion mechanism. [#65060](https://github.com/PaddlePaddle/Paddle/pull/65060), [#65820](https://github.com/PaddlePaddle/Paddle/pull/65820), [#67630](https://github.com/PaddlePaddle/Paddle/pull/67630), [#67809](https://github.com/PaddlePaddle/Paddle/pull/67809), [#68115](https://github.com/PaddlePaddle/Paddle/pull/68115), [#68468](https://github.com/PaddlePaddle/Paddle/pull/68468), [#70023](https://github.com/PaddlePaddle/Paddle/pull/70023)
-- Fixed the bug of incorrect derivation of `shard_degree` in parameter slice parallelism. [#68781](https://github.com/PaddlePaddle/Paddle/pull/68781), [#69214](https://github.com/PaddlePaddle/Paddle/pull/69214)
-- Fixed issues in scenarios such as inconsistent results between dynamic and static graphs in `shard_dataloader`, slicing dict-type data, and custom `sampler` scenarios. [#65262](https://github.com/PaddlePaddle/Paddle/pull/65262), [#66096](https://github.com/PaddlePaddle/Paddle/pull/66096), [#66882](https://github.com/PaddlePaddle/Paddle/pull/66882), [#69620](https://github.com/PaddlePaddle/Paddle/pull/69620)
-- Fixed the bug where the `recompute` setting with `use_reentrant=false` was incompatible with parameter slicing. [#65188](https://github.com/PaddlePaddle/Paddle/pull/65188)
-- Fixed bugs in the parameter loading and saving functions. [#66266](https://github.com/PaddlePaddle/Paddle/pull/66266), [#69764](https://github.com/PaddlePaddle/Paddle/pull/69764)
-- Fixed bugs in operators such as `Conv2D`, `fill_constant`, `flash_attn_grad`, `reduce_scatter`, `if`, `tuple_push`, and `tuple_pop`. [#67587](https://github.com/PaddlePaddle/Paddle/pull/67587), [#68008](https://github.com/PaddlePaddle/Paddle/pull/68008), [#68586](https://github.com/PaddlePaddle/Paddle/pull/68586), [#68589](https://github.com/PaddlePaddle/Paddle/pull/68589), [#69519](https://github.com/PaddlePaddle/Paddle/pull/69519), [#70207](https://github.com/PaddlePaddle/Paddle/pull/70207)
-- Fixed bugs in communication operators such as `reduce_scatter`, `p_send`, and `p_recv`. [#67386](https://github.com/PaddlePaddle/Paddle/pull/67386), [#71433](https://github.com/PaddlePaddle/Paddle/pull/71433)
-- Fixed bugs related to tensor type promotion. [#66541](https://github.com/PaddlePaddle/Paddle/pull/66541), [#68342](https://github.com/PaddlePaddle/Paddle/pull/68342)
-- Fixed the bug where automatic allocation of GPU memory occurred when converting uninitialized distributed tensors to NumPy arrays on some cards. [#66361](https://github.com/PaddlePaddle/Paddle/pull/66361)
-- Fixed the bug that triggered data copying when calling `to_tensor` on non-segmented tensors. [#67169](https://github.com/PaddlePaddle/Paddle/pull/67169)
-- Fixed the bug related to the segmentation of the `scaler` parameter. [#68289](https://github.com/PaddlePaddle/Paddle/pull/68289)
-- Fixed the accuracy issue of `enable_delay_scale_loss`. [#68525](https://github.com/PaddlePaddle/Paddle/pull/68525)
-- Fixed the hang issue caused by different creation orders of communication groups. [#68847](https://github.com/PaddlePaddle/Paddle/pull/68847)
-- Fixed the bug of incorrect `op_role` setting in static graph scenarios. [#67850](https://github.com/PaddlePaddle/Paddle/pull/67850), [#67986](https://github.com/PaddlePaddle/Paddle/pull/67986), [#68156](https://github.com/PaddlePaddle/Paddle/pull/68156)
-- Fixed the bug where the output variable of the random number operator could not be sliced in static graphs. [#67589](https://github.com/PaddlePaddle/Paddle/pull/67589), [#67750](https://github.com/PaddlePaddle/Paddle/pull/67750), [#68067](https://github.com/PaddlePaddle/Paddle/pull/68067)
-- Fixed the bug where the graph cache mechanism failed in static graphs. [#68488](https://github.com/PaddlePaddle/Paddle/pull/68488)
-- Fixed the bug of index out-of-bounds in `paddle.distributed.to_distributed`. [#70174](https://github.com/PaddlePaddle/Paddle/pull/70174)
-- Fixed a bug in the pipeline parallel visualization tool. [#71386](https://github.com/PaddlePaddle/Paddle/pull/71386)
-
-## 5. Operator mechanism
-
-Operator-related PRs, including the splitting of combined operators, the adaptation of new hardware-compatible operator kernels, sparse operator operations, and the retirement of old IR operators, have laid the foundation for PIR-compatible compilers and achieving performance advantages across multiple hardware platforms. The standardization of the operator system has optimized the code structure, reduced technical debt, and improved maintainability.
+- SOT-related: Through improvements such as optimizing the Guard condition mechanism, enhancing dynamic shape processing capabilities, and adding support for no_grad, execution efficiency has been enhanced, functional features have been expanded, and the code structure and performance have been optimized. [#70362](https://github.com/PaddlePaddle/Paddle/pull/70362), [#70154](https://github.com/PaddlePaddle/Paddle/pull/70154), [#71748](https://github.com/PaddlePaddle/Paddle/pull/71748), [#72004](https://github.com/PaddlePaddle/Paddle/pull/72004), [#72159](https://github.com/PaddlePaddle/Paddle/pull/72159), [#72174](https://github.com/PaddlePaddle/Paddle/pull/72174), [#71994](https://github.com/PaddlePaddle/Paddle/pull/71994), [#72250](https://github.com/PaddlePaddle/Paddle/pull/72250), [#72285](https://github.com/PaddlePaddle/Paddle/pull/72285), [#72322](https://github.com/PaddlePaddle/Paddle/pull/72322), [#72272](https://github.com/PaddlePaddle/Paddle/pull/72272), [#72417](https://github.com/PaddlePaddle/Paddle/pull/72417), [#72438](https://github.com/PaddlePaddle/Paddle/pull/72438), [#72462](https://github.com/PaddlePaddle/Paddle/pull/72462), [#72463](https://github.com/PaddlePaddle/Paddle/pull/72463), [#72503](https://github.com/PaddlePaddle/Paddle/pull/72503), [#72501](https://github.com/PaddlePaddle/Paddle/pull/72501), [#72521](https://github.com/PaddlePaddle/Paddle/pull/72521), [#72509](https://github.com/PaddlePaddle/Paddle/pull/72509), [#72544](https://github.com/PaddlePaddle/Paddle/pull/72544), [#73469](https://github.com/PaddlePaddle/Paddle/pull/73469), [#73471](https://github.com/PaddlePaddle/Paddle/pull/73471), [#73555](https://github.com/PaddlePaddle/Paddle/pull/73555)
-### New Features
+### Deprecations
-- Support the splitting of combinatory operators. [#65148](https://github.com/PaddlePaddle/Paddle/pull/65148), [#65007](https://github.com/PaddlePaddle/Paddle/pull/65007), [#65482](https://github.com/PaddlePaddle/Paddle/pull/65482), [#65006](https://github.com/PaddlePaddle/Paddle/pull/65006), [#65692](https://github.com/PaddlePaddle/Paddle/pull/65692), [#65961](https://github.com/PaddlePaddle/Paddle/pull/65961), [#65968](https://github.com/PaddlePaddle/Paddle/pull/65968), [#65967](https://github.com/PaddlePaddle/Paddle/pull/65967), [#66510](https://github.com/PaddlePaddle/Paddle/pull/66510), [#66795](https://github.com/PaddlePaddle/Paddle/pull/66795), [#66835](https://github.com/PaddlePaddle/Paddle/pull/66835), [#67151](https://github.com/PaddlePaddle/Paddle/pull/67151), [#67342](https://github.com/PaddlePaddle/Paddle/pull/67342), [#67481](https://github.com/PaddlePaddle/Paddle/pull/67481), [#67502](https://github.com/PaddlePaddle/Paddle/pull/67502), [#67606](https://github.com/PaddlePaddle/Paddle/pull/67606), [#67757](https://github.com/PaddlePaddle/Paddle/pull/67757), [#67775](https://github.com/PaddlePaddle/Paddle/pull/67775), [#67891](https://github.com/PaddlePaddle/Paddle/pull/67891), [#67790](https://github.com/PaddlePaddle/Paddle/pull/67790), [#67965](https://github.com/PaddlePaddle/Paddle/pull/67965), [#67968](https://github.com/PaddlePaddle/Paddle/pull/67968), [#68168](https://github.com/PaddlePaddle/Paddle/pull/68168), [#68125](https://github.com/PaddlePaddle/Paddle/pull/68125), [#68228](https://github.com/PaddlePaddle/Paddle/pull/68228), [#68295](https://github.com/PaddlePaddle/Paddle/pull/68295), [#68353](https://github.com/PaddlePaddle/Paddle/pull/68353), [#68357](https://github.com/PaddlePaddle/Paddle/pull/68357), [#68827](https://github.com/PaddlePaddle/Paddle/pull/68827), [#68834](https://github.com/PaddlePaddle/Paddle/pull/68834), [#69239](https://github.com/PaddlePaddle/Paddle/pull/69239), [#68817](https://github.com/PaddlePaddle/Paddle/pull/68817), [#69108](https://github.com/PaddlePaddle/Paddle/pull/69108), [#69373](https://github.com/PaddlePaddle/Paddle/pull/69373), [#69372](https://github.com/PaddlePaddle/Paddle/pull/69372), [#68829](https://github.com/PaddlePaddle/Paddle/pull/68829), [#69684](https://github.com/PaddlePaddle/Paddle/pull/69684), [#68818](https://github.com/PaddlePaddle/Paddle/pull/68818), [#68835](https://github.com/PaddlePaddle/Paddle/pull/68835), [#69838](https://github.com/PaddlePaddle/Paddle/pull/69838), [#69998](https://github.com/PaddlePaddle/Paddle/pull/69998), [#69675](https://github.com/PaddlePaddle/Paddle/pull/69675), [#70367](https://github.com/PaddlePaddle/Paddle/pull/70367), [#70080](https://github.com/PaddlePaddle/Paddle/pull/70080), [#71352](https://github.com/PaddlePaddle/Paddle/pull/71352), [#66450](https://github.com/PaddlePaddle/Paddle/pull/66450), [#67593](https://github.com/PaddlePaddle/Paddle/pull/67593), [#67988](https://github.com/PaddlePaddle/Paddle/pull/67988), [#68346](https://github.com/PaddlePaddle/Paddle/pull/68346), [#68399](https://github.com/PaddlePaddle/Paddle/pull/68399), [#68319](https://github.com/PaddlePaddle/Paddle/pull/68319), [#68485](https://github.com/PaddlePaddle/Paddle/pull/68485), [#68961](https://github.com/PaddlePaddle/Paddle/pull/68961), [#68575](https://github.com/PaddlePaddle/Paddle/pull/68575)
-- PIR supports Pylayer. [#69674](https://github.com/PaddlePaddle/Paddle/pull/69674), [#70375](https://github.com/PaddlePaddle/Paddle/pull/70375)
-- Support for XPU-related operator computations. [#65684](https://github.com/PaddlePaddle/Paddle/pull/65684), [#65976](https://github.com/PaddlePaddle/Paddle/pull/65976), [#68497](https://github.com/PaddlePaddle/Paddle/pull/68497)
-- PIR supports sparse operators. [#62663](https://github.com/PaddlePaddle/Paddle/pull/62663), [#67885](https://github.com/PaddlePaddle/Paddle/pull/67885), [#67976](https://github.com/PaddlePaddle/Paddle/pull/67976), [#68261](https://github.com/PaddlePaddle/Paddle/pull/68261), [#68326](https://github.com/PaddlePaddle/Paddle/pull/68326)
-- Support manual Recompute. [#65879](https://github.com/PaddlePaddle/Paddle/pull/65879)
-- Implement the kernel and register the operator. [#63130](https://github.com/PaddlePaddle/Paddle/pull/63130)
-- Support for Custom Op. [#68824](https://github.com/PaddlePaddle/Paddle/pull/68824), [#68748](https://github.com/PaddlePaddle/Paddle/pull/68748)
-- Added dynamic graph second-order inverse composition for acos. [#70409](https://github.com/PaddlePaddle/Paddle/pull/70409)
-- Support initialization and computation of 0-size tensors. [#70504](https://github.com/PaddlePaddle/Paddle/pull/70504)
+- Code cleanup: Cleaned up Python 3.8 support declarations, and completed related code cleanup, dependency streamlining, and syntax modernization updates to optimize code maintainability and compatibility. [#71815](https://github.com/PaddlePaddle/Paddle/pull/71815), [#72802](https://github.com/PaddlePaddle/Paddle/pull/72802), [#72856](https://github.com/PaddlePaddle/Paddle/pull/72856), [#72854](https://github.com/PaddlePaddle/Paddle/pull/72854), [#72855](https://github.com/PaddlePaddle/Paddle/pull/72855), [#72873](https://github.com/PaddlePaddle/Paddle/pull/72873), [#72870](https://github.com/PaddlePaddle/Paddle/pull/72870), [#72868](https://github.com/PaddlePaddle/Paddle/pull/72868), [#72891](https://github.com/PaddlePaddle/Paddle/pull/72891)
-### Bug Fixes
+### Devs
-- Fixed bugs related to composite operators. [#70250](https://github.com/PaddlePaddle/Paddle/pull/70250), [#67170](https://github.com/PaddlePaddle/Paddle/pull/67170), [#71218](https://github.com/PaddlePaddle/Paddle/pull/71218), [#69095](https://github.com/PaddlePaddle/Paddle/pull/69095), [#70189](https://github.com/PaddlePaddle/Paddle/pull/70189)
-- Fixed XPU-related bugs. [#65149](https://github.com/PaddlePaddle/Paddle/pull/65149), [#70845](https://github.com/PaddlePaddle/Paddle/pull/70845)
-- Fixed shape-related bugs. [#68722](https://github.com/PaddlePaddle/Paddle/pull/68722), [#70210](https://github.com/PaddlePaddle/Paddle/pull/70210), [#70492](https://github.com/PaddlePaddle/Paddle/pull/70492)
-- Fixed save/load-related bugs. [#69153](https://github.com/PaddlePaddle/Paddle/pull/69153)
-- Fixed bugs related to types. [#65721](https://github.com/PaddlePaddle/Paddle/pull/65721), [#65859](https://github.com/PaddlePaddle/Paddle/pull/65859)
-- Fixing issues during the invocation and execution of other operators, including type matching, type inference, parameter type support, etc,. [#65360](https://github.com/PaddlePaddle/Paddle/pull/65360), [#65024](https://github.com/PaddlePaddle/Paddle/pull/65024), [#66308](https://github.com/PaddlePaddle/Paddle/pull/66308), [#67085](https://github.com/PaddlePaddle/Paddle/pull/67085), [#67285](https://github.com/PaddlePaddle/Paddle/pull/67285), [#67076](https://github.com/PaddlePaddle/Paddle/pull/67076), [#67547](https://github.com/PaddlePaddle/Paddle/pull/67547), [#68007](https://github.com/PaddlePaddle/Paddle/pull/68007), [#68527](https://github.com/PaddlePaddle/Paddle/pull/68527), [#68549](https://github.com/PaddlePaddle/Paddle/pull/68549), [#68543](https://github.com/PaddlePaddle/Paddle/pull/68543), [#68604](https://github.com/PaddlePaddle/Paddle/pull/68604), [#68741](https://github.com/PaddlePaddle/Paddle/pull/68741), [#68859](https://github.com/PaddlePaddle/Paddle/pull/68859), [#69025](https://github.com/PaddlePaddle/Paddle/pull/69025), [#69065](https://github.com/PaddlePaddle/Paddle/pull/69065), [#69405](https://github.com/PaddlePaddle/Paddle/pull/69405), [#69688](https://github.com/PaddlePaddle/Paddle/pull/69688), [#69912](https://github.com/PaddlePaddle/Paddle/pull/69912), [#70177](https://github.com/PaddlePaddle/Paddle/pull/70177), [#70517](https://github.com/PaddlePaddle/Paddle/pull/70517), [#70596](https://github.com/PaddlePaddle/Paddle/pull/70596), [#70788](https://github.com/PaddlePaddle/Paddle/pull/70788), [#70870](https://github.com/PaddlePaddle/Paddle/pull/70870), [#71332](https://github.com/PaddlePaddle/Paddle/pull/71332), [#71454](https://github.com/PaddlePaddle/Paddle/pull/71454), [#71442](https://github.com/PaddlePaddle/Paddle/pull/71442), [#71499](https://github.com/PaddlePaddle/Paddle/pull/71499), [#67459](https://github.com/PaddlePaddle/Paddle/pull/67459), [#68470](https://github.com/PaddlePaddle/Paddle/pull/68470), [#70206](https://github.com/PaddlePaddle/Paddle/pull/70206)
+- Optimized CINN backend integration and dynamic shape processing logic, improved framework stability through code structure refactoring and test reinforcement, and added debug log functionality to enhance maintainability. [#71817](https://github.com/PaddlePaddle/Paddle/pull/71817), [#71896](https://github.com/PaddlePaddle/Paddle/pull/71896), [#71984](https://github.com/PaddlePaddle/Paddle/pull/71984), [#72067](https://github.com/PaddlePaddle/Paddle/pull/72067), [#72165](https://github.com/PaddlePaddle/Paddle/pull/72165), [#72207](https://github.com/PaddlePaddle/Paddle/pull/72207), [#72235](https://github.com/PaddlePaddle/Paddle/pull/72235), [#72273](https://github.com/PaddlePaddle/Paddle/pull/72273), [#72326](https://github.com/PaddlePaddle/Paddle/pull/72326), [#72400](https://github.com/PaddlePaddle/Paddle/pull/72400), [#72381](https://github.com/PaddlePaddle/Paddle/pull/72381), [#72560](https://github.com/PaddlePaddle/Paddle/pull/72560), [#72783](https://github.com/PaddlePaddle/Paddle/pull/72783), [#73530](https://github.com/PaddlePaddle/Paddle/pull/73530)
### Others
-- Optimize code style. [#68536](https://github.com/PaddlePaddle/Paddle/pull/68536)
-- Fix spelling errors. [#67456](https://github.com/PaddlePaddle/Paddle/pull/67456), [#66673](https://github.com/PaddlePaddle/Paddle/pull/66673), [#68702](https://github.com/PaddlePaddle/Paddle/pull/68702), [#68735](https://github.com/PaddlePaddle/Paddle/pull/68735), [#68718](https://github.com/PaddlePaddle/Paddle/pull/68718), [#70700](https://github.com/PaddlePaddle/Paddle/pull/70700), [#70682](https://github.com/PaddlePaddle/Paddle/pull/70682), [#70670](https://github.com/PaddlePaddle/Paddle/pull/70670), [#70241](https://github.com/PaddlePaddle/Paddle/pull/70241), [#69626](https://github.com/PaddlePaddle/Paddle/pull/69626), [#70051](https://github.com/PaddlePaddle/Paddle/pull/70051), [#67764](https://github.com/PaddlePaddle/Paddle/pull/67764), [#68872](https://github.com/PaddlePaddle/Paddle/pull/68872), [#70055](https://github.com/PaddlePaddle/Paddle/pull/70055), [#67954](https://github.com/PaddlePaddle/Paddle/pull/67954), [#67404](https://github.com/PaddlePaddle/Paddle/pull/67404), [#69273](https://github.com/PaddlePaddle/Paddle/pull/69273), [#66981](https://github.com/PaddlePaddle/Paddle/pull/66981), [#68145](https://github.com/PaddlePaddle/Paddle/pull/68145), [#69148](https://github.com/PaddlePaddle/Paddle/pull/69148), [#69145](https://github.com/PaddlePaddle/Paddle/pull/69145), [#69168](https://github.com/PaddlePaddle/Paddle/pull/69168), [#68940](https://github.com/PaddlePaddle/Paddle/pull/68940), [#70344](https://github.com/PaddlePaddle/Paddle/pull/70344)
-- Modify the interface documentation. [#69378](https://github.com/PaddlePaddle/Paddle/pull/69378)
-- Replaced operator and parameter naming under the fluid operator system. [#69345](https://github.com/PaddlePaddle/Paddle/pull/69345), [#69382](https://github.com/PaddlePaddle/Paddle/pull/69382), [#69484](https://github.com/PaddlePaddle/Paddle/pull/69484), [#69444](https://github.com/PaddlePaddle/Paddle/pull/69444)
+- Others: Added kernel support for FP16/BF16 data types in CPU sections, optimized error handling and tolerance configuration in the testing module, etc. [#71764](https://github.com/PaddlePaddle/Paddle/pull/71764), [#71951](https://github.com/PaddlePaddle/Paddle/pull/71951), [#72944](https://github.com/PaddlePaddle/Paddle/pull/72944)
-### Discarded
+## 3. CINN
-- xshape output exit. [#66769](https://github.com/PaddlePaddle/Paddle/pull/66769), [#67009](https://github.com/PaddlePaddle/Paddle/pull/67009), [#67152](https://github.com/PaddlePaddle/Paddle/pull/67152), [#67172](https://github.com/PaddlePaddle/Paddle/pull/67172), [#67355](https://github.com/PaddlePaddle/Paddle/pull/67355), [#67373](https://github.com/PaddlePaddle/Paddle/pull/67373), [#66089](https://github.com/PaddlePaddle/Paddle/pull/66089)
-- Remove the obsolete operators, their kernels, related unit tests, and related calling codes under the fluid system. [#67370](https://github.com/PaddlePaddle/Paddle/pull/67370), [#67088](https://github.com/PaddlePaddle/Paddle/pull/67088), [#67324](https://github.com/PaddlePaddle/Paddle/pull/67324), [#67666](https://github.com/PaddlePaddle/Paddle/pull/67666), [#68058](https://github.com/PaddlePaddle/Paddle/pull/68058), [#68311](https://github.com/PaddlePaddle/Paddle/pull/68311), [#68358](https://github.com/PaddlePaddle/Paddle/pull/68358), [#68312](https://github.com/PaddlePaddle/Paddle/pull/68312), [#68355](https://github.com/PaddlePaddle/Paddle/pull/68355), [#67528](https://github.com/PaddlePaddle/Paddle/pull/67528), [#68316](https://github.com/PaddlePaddle/Paddle/pull/68316), [#68356](https://github.com/PaddlePaddle/Paddle/pull/68356), [#68397](https://github.com/PaddlePaddle/Paddle/pull/68397), [#68441](https://github.com/PaddlePaddle/Paddle/pull/68441), [#68417](https://github.com/PaddlePaddle/Paddle/pull/68417), [#68567](https://github.com/PaddlePaddle/Paddle/pull/68567), [#68583](https://github.com/PaddlePaddle/Paddle/pull/68583), [#68649](https://github.com/PaddlePaddle/Paddle/pull/68649), [#68331](https://github.com/PaddlePaddle/Paddle/pull/68331), [#68730](https://github.com/PaddlePaddle/Paddle/pull/68730), [#69754](https://github.com/PaddlePaddle/Paddle/pull/69754), [#69445](https://github.com/PaddlePaddle/Paddle/pull/69445), [#69921](https://github.com/PaddlePaddle/Paddle/pull/69921), [#70268](https://github.com/PaddlePaddle/Paddle/pull/70268), [#69446](https://github.com/PaddlePaddle/Paddle/pull/69446), [#69544](https://github.com/PaddlePaddle/Paddle/pull/69544), [#70272](https://github.com/PaddlePaddle/Paddle/pull/70272), [#69745](https://github.com/PaddlePaddle/Paddle/pull/69745), [#70300](https://github.com/PaddlePaddle/Paddle/pull/70300), [#70388](https://github.com/PaddlePaddle/Paddle/pull/70388), [#70421](https://github.com/PaddlePaddle/Paddle/pull/70421), [#70302](https://github.com/PaddlePaddle/Paddle/pull/70302), [#70445](https://github.com/PaddlePaddle/Paddle/pull/70445), [#69275](https://github.com/PaddlePaddle/Paddle/pull/69275), [#69081](https://github.com/PaddlePaddle/Paddle/pull/69081), [#70588](https://github.com/PaddlePaddle/Paddle/pull/70588), [#67778](https://github.com/PaddlePaddle/Paddle/pull/67778), [#67953](https://github.com/PaddlePaddle/Paddle/pull/67953), [#68093](https://github.com/PaddlePaddle/Paddle/pull/68093), [#68092](https://github.com/PaddlePaddle/Paddle/pull/68092), [#67684](https://github.com/PaddlePaddle/Paddle/pull/67684), [#69665](https://github.com/PaddlePaddle/Paddle/pull/69665), [#67915](https://github.com/PaddlePaddle/Paddle/pull/67915), [#67917](https://github.com/PaddlePaddle/Paddle/pull/67917), [#68403](https://github.com/PaddlePaddle/Paddle/pull/68403), [#68404](https://github.com/PaddlePaddle/Paddle/pull/68404), [#68969](https://github.com/PaddlePaddle/Paddle/pull/68969), [#68953](https://github.com/PaddlePaddle/Paddle/pull/68953), [#68954](https://github.com/PaddlePaddle/Paddle/pull/68954), [#68942](https://github.com/PaddlePaddle/Paddle/pull/68942), [#68950](https://github.com/PaddlePaddle/Paddle/pull/68950), [#69381](https://github.com/PaddlePaddle/Paddle/pull/69381), [#69380](https://github.com/PaddlePaddle/Paddle/pull/69380), [#69448](https://github.com/PaddlePaddle/Paddle/pull/69448), [#69680](https://github.com/PaddlePaddle/Paddle/pull/69680), [#69775](https://github.com/PaddlePaddle/Paddle/pull/69775), [#69812](https://github.com/PaddlePaddle/Paddle/pull/69812), [#69840](https://github.com/PaddlePaddle/Paddle/pull/69840), [#69828](https://github.com/PaddlePaddle/Paddle/pull/69828), [#69742](https://github.com/PaddlePaddle/Paddle/pull/69742), [#69923](https://github.com/PaddlePaddle/Paddle/pull/69923), [#69922](https://github.com/PaddlePaddle/Paddle/pull/69922), [#69904](https://github.com/PaddlePaddle/Paddle/pull/69904), [#70002](https://github.com/PaddlePaddle/Paddle/pull/70002), [#70054](https://github.com/PaddlePaddle/Paddle/pull/70054), [#70052](https://github.com/PaddlePaddle/Paddle/pull/70052), [#70053](https://github.com/PaddlePaddle/Paddle/pull/70053), [#70713](https://github.com/PaddlePaddle/Paddle/pull/70713), [#70718](https://github.com/PaddlePaddle/Paddle/pull/70718), [#70718](https://github.com/PaddlePaddle/Paddle/pull/70718), [#70717](https://github.com/PaddlePaddle/Paddle/pull/70717)
-- Remove deprecated flags. [#70727](https://github.com/PaddlePaddle/Paddle/pull/70727), [#70726](https://github.com/PaddlePaddle/Paddle/pull/70726)
-- Remove the deprecated API of combination operators. [#69873](https://github.com/PaddlePaddle/Paddle/pull/69873), [#69309](https://github.com/PaddlePaddle/Paddle/pull/69309)
+Optimize compiler performance and enhance stability
-### Developer-related
+### Performance
-- Support for composition operators, including adapter operators, adding flags, test cases, etc. [#67725](https://github.com/PaddlePaddle/Paddle/pull/67725), [#65252](https://github.com/PaddlePaddle/Paddle/pull/65252), [#67590](https://github.com/PaddlePaddle/Paddle/pull/67590), [#68076](https://github.com/PaddlePaddle/Paddle/pull/68076), [#66711](https://github.com/PaddlePaddle/Paddle/pull/66711), [#68813](https://github.com/PaddlePaddle/Paddle/pull/68813), [#68928](https://github.com/PaddlePaddle/Paddle/pull/68928), [#69054](https://github.com/PaddlePaddle/Paddle/pull/69054), [#69156](https://github.com/PaddlePaddle/Paddle/pull/69156), [#69255](https://github.com/PaddlePaddle/Paddle/pull/69255), [#69460](https://github.com/PaddlePaddle/Paddle/pull/69460), [#70270](https://github.com/PaddlePaddle/Paddle/pull/70270)
-- Add unit tests for operators. [#68272](https://github.com/PaddlePaddle/Paddle/pull/68272), [#68490](https://github.com/PaddlePaddle/Paddle/pull/68490)
-- Added operator API aliases for PaddleCustomDevice. [#69526](https://github.com/PaddlePaddle/Paddle/pull/69526)
-- Define the position of the shift operator to ensure it only supports dynamic graphs. [#69289](https://github.com/PaddlePaddle/Paddle/pull/69289)
-- Annotate only forward computation operators. [#68580](https://github.com/PaddlePaddle/Paddle/pull/68580)
-- Change the inverse operator of the view operation to reuse the forward operator, thereby supporting the need for higher-order differentiation in scientific computing scenarios. [#71086](https://github.com/PaddlePaddle/Paddle/pull/71086)
-- Migrate operator file location/modify function namespace/modify function parameter names, etc. [#66393](https://github.com/PaddlePaddle/Paddle/pull/66393), [#67066](https://github.com/PaddlePaddle/Paddle/pull/67066), [#67012](https://github.com/PaddlePaddle/Paddle/pull/67012), [#67243](https://github.com/PaddlePaddle/Paddle/pull/67243), [#67367](https://github.com/PaddlePaddle/Paddle/pull/67367), [#67760](https://github.com/PaddlePaddle/Paddle/pull/67760), [#67242](https://github.com/PaddlePaddle/Paddle/pull/67242), [#67189](https://github.com/PaddlePaddle/Paddle/pull/67189), [#67899](https://github.com/PaddlePaddle/Paddle/pull/67899), [#67687](https://github.com/PaddlePaddle/Paddle/pull/67687), [#68035](https://github.com/PaddlePaddle/Paddle/pull/68035), [#67682](https://github.com/PaddlePaddle/Paddle/pull/67682), [#68464](https://github.com/PaddlePaddle/Paddle/pull/68464), [#68469](https://github.com/PaddlePaddle/Paddle/pull/68469), [#67900](https://github.com/PaddlePaddle/Paddle/pull/67900), [#68563](https://github.com/PaddlePaddle/Paddle/pull/68563), [#68562](https://github.com/PaddlePaddle/Paddle/pull/68562), [#68564](https://github.com/PaddlePaddle/Paddle/pull/68564), [#68479](https://github.com/PaddlePaddle/Paddle/pull/68479), [#68588](https://github.com/PaddlePaddle/Paddle/pull/68588), [#68726](https://github.com/PaddlePaddle/Paddle/pull/68726), [#68719](https://github.com/PaddlePaddle/Paddle/pull/68719), [#68767](https://github.com/PaddlePaddle/Paddle/pull/68767), [#68557](https://github.com/PaddlePaddle/Paddle/pull/68557), [#68671](https://github.com/PaddlePaddle/Paddle/pull/68671), [#68786](https://github.com/PaddlePaddle/Paddle/pull/68786), [#67948](https://github.com/PaddlePaddle/Paddle/pull/67948), [#64999](https://github.com/PaddlePaddle/Paddle/pull/64999), [#68581](https://github.com/PaddlePaddle/Paddle/pull/68581), [#68361](https://github.com/PaddlePaddle/Paddle/pull/68361), [#68656](https://github.com/PaddlePaddle/Paddle/pull/68656), [#68396](https://github.com/PaddlePaddle/Paddle/pull/68396), [#68059](https://github.com/PaddlePaddle/Paddle/pull/68059), [#68785](https://github.com/PaddlePaddle/Paddle/pull/68785), [#68665](https://github.com/PaddlePaddle/Paddle/pull/68665), [#68869](https://github.com/PaddlePaddle/Paddle/pull/68869), [#67626](https://github.com/PaddlePaddle/Paddle/pull/67626), [#68921](https://github.com/PaddlePaddle/Paddle/pull/68921), [#69268](https://github.com/PaddlePaddle/Paddle/pull/69268), [#69271](https://github.com/PaddlePaddle/Paddle/pull/69271), [#69306](https://github.com/PaddlePaddle/Paddle/pull/69306), [#69302](https://github.com/PaddlePaddle/Paddle/pull/69302), [#69341](https://github.com/PaddlePaddle/Paddle/pull/69341), [#69364](https://github.com/PaddlePaddle/Paddle/pull/69364), [#69343](https://github.com/PaddlePaddle/Paddle/pull/69343), [#69383](https://github.com/PaddlePaddle/Paddle/pull/69383), [#69415](https://github.com/PaddlePaddle/Paddle/pull/69415), [#69437](https://github.com/PaddlePaddle/Paddle/pull/69437), [#69494](https://github.com/PaddlePaddle/Paddle/pull/69494), [#69541](https://github.com/PaddlePaddle/Paddle/pull/69541), [#69543](https://github.com/PaddlePaddle/Paddle/pull/69543), [#69540](https://github.com/PaddlePaddle/Paddle/pull/69540), [#69569](https://github.com/PaddlePaddle/Paddle/pull/69569), [#69568](https://github.com/PaddlePaddle/Paddle/pull/69568), [#69621](https://github.com/PaddlePaddle/Paddle/pull/69621), [#69622](https://github.com/PaddlePaddle/Paddle/pull/69622), [#69701](https://github.com/PaddlePaddle/Paddle/pull/69701), [#69702](https://github.com/PaddlePaddle/Paddle/pull/69702), [#69704](https://github.com/PaddlePaddle/Paddle/pull/69704), [#69743](https://github.com/PaddlePaddle/Paddle/pull/69743), [#69780](https://github.com/PaddlePaddle/Paddle/pull/69780), [#69814](https://github.com/PaddlePaddle/Paddle/pull/69814), [#69822](https://github.com/PaddlePaddle/Paddle/pull/69822), [#69893](https://github.com/PaddlePaddle/Paddle/pull/69893), [#69967](https://github.com/PaddlePaddle/Paddle/pull/69967), [#69976](https://github.com/PaddlePaddle/Paddle/pull/69976), [#70011](https://github.com/PaddlePaddle/Paddle/pull/70011), [#70015](https://github.com/PaddlePaddle/Paddle/pull/70015), [#70007](https://github.com/PaddlePaddle/Paddle/pull/70007), [#70010](https://github.com/PaddlePaddle/Paddle/pull/70010), [#70346](https://github.com/PaddlePaddle/Paddle/pull/70346), [#70414](https://github.com/PaddlePaddle/Paddle/pull/70414), [#69951](https://github.com/PaddlePaddle/Paddle/pull/69951), [#70299](https://github.com/PaddlePaddle/Paddle/pull/70299), [#70441](https://github.com/PaddlePaddle/Paddle/pull/70441), [#70435](https://github.com/PaddlePaddle/Paddle/pull/70435), [#68420](https://github.com/PaddlePaddle/Paddle/pull/68420), [#70671](https://github.com/PaddlePaddle/Paddle/pull/70671), [#70705](https://github.com/PaddlePaddle/Paddle/pull/70705), [#68540](https://github.com/PaddlePaddle/Paddle/pull/68540), [#70211](https://github.com/PaddlePaddle/Paddle/pull/70211), [#67489](https://github.com/PaddlePaddle/Paddle/pull/67489), [#66927](https://github.com/PaddlePaddle/Paddle/pull/66927), [#66942](https://github.com/PaddlePaddle/Paddle/pull/66942), [#66848](https://github.com/PaddlePaddle/Paddle/pull/66848), [#66796](https://github.com/PaddlePaddle/Paddle/pull/66796), [#67036](https://github.com/PaddlePaddle/Paddle/pull/67036), [#67244](https://github.com/PaddlePaddle/Paddle/pull/67244), [#67299](https://github.com/PaddlePaddle/Paddle/pull/67299), [#67171](https://github.com/PaddlePaddle/Paddle/pull/67171), [#67293](https://github.com/PaddlePaddle/Paddle/pull/67293), [#67208](https://github.com/PaddlePaddle/Paddle/pull/67208), [#67408](https://github.com/PaddlePaddle/Paddle/pull/67408), [#67523](https://github.com/PaddlePaddle/Paddle/pull/67523), [#67689](https://github.com/PaddlePaddle/Paddle/pull/67689), [#67694](https://github.com/PaddlePaddle/Paddle/pull/67694), [#67797](https://github.com/PaddlePaddle/Paddle/pull/67797), [#67894](https://github.com/PaddlePaddle/Paddle/pull/67894), [#65969](https://github.com/PaddlePaddle/Paddle/pull/65969), [#65939](https://github.com/PaddlePaddle/Paddle/pull/65939), [#67928](https://github.com/PaddlePaddle/Paddle/pull/67928), [#68097](https://github.com/PaddlePaddle/Paddle/pull/68097), [#66744](https://github.com/PaddlePaddle/Paddle/pull/66744), [#68496](https://github.com/PaddlePaddle/Paddle/pull/68496), [#66943](https://github.com/PaddlePaddle/Paddle/pull/66943), [#68773](https://github.com/PaddlePaddle/Paddle/pull/68773), [#69272](https://github.com/PaddlePaddle/Paddle/pull/69272)
-- Move test file locations. [#67564](https://github.com/PaddlePaddle/Paddle/pull/67564), [#68266](https://github.com/PaddlePaddle/Paddle/pull/68266), [#68634](https://github.com/PaddlePaddle/Paddle/pull/68634)
-- Pre-modification related to xshape output exit. [#67543](https://github.com/PaddlePaddle/Paddle/pull/67543), [#67572](https://github.com/PaddlePaddle/Paddle/pull/67572)
+- Support automatic conversion and optimization of Layout in training scenarios. ([#71891](https://github.com/PaddlePaddle/Paddle/pull/71891))
+- Kernel compilation optimizations for operators such as argmin, argmax, and arange have been added to the backend. ([#71956](https://github.com/PaddlePaddle/Paddle/pull/71956), [#72598](https://github.com/PaddlePaddle/Paddle/pull/72598)))
+- Support for fused optimization of matrix multiplication. ([#72846](https://github.com/PaddlePaddle/Paddle/pull/72846))
+- Optimize the computation performance of some operators, specifically the Kernel. ([#72871](https://github.com/PaddlePaddle/Paddle/pull/72871))
-### Improvement
+### Bug fixes
-- Supported more data types. [#69143](https://github.com/PaddlePaddle/Paddle/pull/69143)
-- Update xpu interface. [#69800](https://github.com/PaddlePaddle/Paddle/pull/69800)
-- Improved operator printing functionality. [#69916](https://github.com/PaddlePaddle/Paddle/pull/69916)
-- Upgraded the normalize operation to support more scenarios. [#70152](https://github.com/PaddlePaddle/Paddle/pull/70152)
-- Extended group_norm to handle cases where the rank is greater than 5. [#68774](https://github.com/PaddlePaddle/Paddle/pull/68774)
-- Improved the usage of backward_blacklist. [#69356](https://github.com/PaddlePaddle/Paddle/pull/69356)
+Fix some processing logic bugs in various scenarios. ([#71813](https://github.com/PaddlePaddle/Paddle/pull/71813), [#71886](https://github.com/PaddlePaddle/Paddle/pull/71886), [#71927](https://github.com/PaddlePaddle/Paddle/pull/71927), [#71915](https://github.com/PaddlePaddle/Paddle/pull/71915), [#71946](https://github.com/PaddlePaddle/Paddle/pull/71946), [#71949](https://github.com/PaddlePaddle/Paddle/pull/71949), [#71955](https://github.com/PaddlePaddle/Paddle/pull/71955), [#71942](https://github.com/PaddlePaddle/Paddle/pull/71942), [#71939](https://github.com/PaddlePaddle/Paddle/pull/71939), [#71973](https://github.com/PaddlePaddle/Paddle/pull/71973), [#72001](https://github.com/PaddlePaddle/Paddle/pull/72001), [#72020](https://github.com/PaddlePaddle/Paddle/pull/72020), [#72014](https://github.com/PaddlePaddle/Paddle/pull/72014), [#72021](https://github.com/PaddlePaddle/Paddle/pull/72021), [#72027](https://github.com/PaddlePaddle/Paddle/pull/72027), [#72061](https://github.com/PaddlePaddle/Paddle/pull/72061), [#72025](https://github.com/PaddlePaddle/Paddle/pull/72025), [#72095](https://github.com/PaddlePaddle/Paddle/pull/72095), [#72108](https://github.com/PaddlePaddle/Paddle/pull/72108), [#72132](https://github.com/PaddlePaddle/Paddle/pull/72132), [#71985](https://github.com/PaddlePaddle/Paddle/pull/71985), [#72106](https://github.com/PaddlePaddle/Paddle/pull/72106), [#72140](https://github.com/PaddlePaddle/Paddle/pull/72140), [#72167](https://github.com/PaddlePaddle/Paddle/pull/72167), [#72037](https://github.com/PaddlePaddle/Paddle/pull/72037), [#72178](https://github.com/PaddlePaddle/Paddle/pull/72178), [#72143](https://github.com/PaddlePaddle/Paddle/pull/72143), [#72175](https://github.com/PaddlePaddle/Paddle/pull/72175), [#72191](https://github.com/PaddlePaddle/Paddle/pull/72191), [#72213](https://github.com/PaddlePaddle/Paddle/pull/72213), [#72189](https://github.com/PaddlePaddle/Paddle/pull/72189), [#72214](https://github.com/PaddlePaddle/Paddle/pull/72214), [#72166](https://github.com/PaddlePaddle/Paddle/pull/72166), [#72180](https://github.com/PaddlePaddle/Paddle/pull/72180), [#72284](https://github.com/PaddlePaddle/Paddle/pull/72284), [#72267](https://github.com/PaddlePaddle/Paddle/pull/72267), [#72348](https://github.com/PaddlePaddle/Paddle/pull/72348), [#72332](https://github.com/PaddlePaddle/Paddle/pull/72332), [#72307](https://github.com/PaddlePaddle/Paddle/pull/72307), [#72353](https://github.com/PaddlePaddle/Paddle/pull/72353), [#72204](https://github.com/PaddlePaddle/Paddle/pull/72204), [#72457](https://github.com/PaddlePaddle/Paddle/pull/72457), [#72426](https://github.com/PaddlePaddle/Paddle/pull/72426), [#72536](https://github.com/PaddlePaddle/Paddle/pull/72536), [#72541](https://github.com/PaddlePaddle/Paddle/pull/72541), [#72365](https://github.com/PaddlePaddle/Paddle/pull/72365), [#72621](https://github.com/PaddlePaddle/Paddle/pull/72621), [#72630](https://github.com/PaddlePaddle/Paddle/pull/72630), [#72669](https://github.com/PaddlePaddle/Paddle/pull/72669), [#72682](https://github.com/PaddlePaddle/Paddle/pull/72682), [#72732](https://github.com/PaddlePaddle/Paddle/pull/72732), [#72811](https://github.com/PaddlePaddle/Paddle/pull/72811), [#72941](https://github.com/PaddlePaddle/Paddle/pull/72941), [#72795](https://github.com/PaddlePaddle/Paddle/pull/72795), [#73536](https://github.com/PaddlePaddle/Paddle/pull/73536))
+
+## 4. Auto Parallel architecture
+
+In version 3.1, we further refined the automatic parallel architecture to enhance the usability of automatic parallelism and the performance of dynamic graphs. Specifically, we improved the core mechanism of automatic parallelism, including adding multiple operator splitting derivation rules, supporting the splitting of the same dimension of distributed tensors by multiple mesh dimensions, and supporting dynamic graph parallel strategies (PP, CP, SEP, TP-CONV), etc. At the same time, we systematically optimized the performance of the automatic parallel system for dynamic graphs, achieving performance that is basically on par with manual parallelism on models such as Llama.
+
+### Improvements
+
+- Support for distributed tensors where the same dimension is partitioned by multiple mesh dimensions. [#73233](https://github.com/PaddlePaddle/Paddle/pull/73233)
+- Support for converting automatic parallel communication topology descriptions (ProcessMesh) into manual parallel communication groups. [#72052](https://github.com/PaddlePaddle/Paddle/pull/72052)
+- Support send/recv of any serializable Python object. [#72098](https://github.com/PaddlePaddle/Paddle/pull/72098)
+- Complete dynamic graph parallel strategy
+- Support for pipeline parallelism strategies 1F1B and VPP scheduling. [#72155](https://github.com/PaddlePaddle/Paddle/pull/72155), [#72480](https://github.com/PaddlePaddle/Paddle/pull/72480), [#72179](https://github.com/PaddlePaddle/Paddle/pull/72179)
+- Support for parallel processing of long texts. [#73195](https://github.com/PaddlePaddle/Paddle/pull/73195)
+- Support for visual parallelism strategy. [#73063](https://github.com/PaddlePaddle/Paddle/pull/73063), [#73039](https://github.com/PaddlePaddle/Paddle/pull/73039)
+- Support automatic parallel communication in the data parallel dimension. [#72540](https://github.com/PaddlePaddle/Paddle/pull/72540)
+- Add the following operator segmentation derivation rules
+- `min`, `min_grad` [#72269](https://github.com/PaddlePaddle/Paddle/pull/72269)
+- `bitwise_or`,`atan2`,`fmax`,`fmin`,`reciprocal` [#72310](https://github.com/PaddlePaddle/Paddle/pull/72310)
+- `argmin`, `abs`, `cosh` [#72264](https://github.com/PaddlePaddle/Paddle/pull/72264)
+- `mean_all`, `mean_all_grad` [#72479](https://github.com/PaddlePaddle/Paddle/pull/72479)
+- `topk`, `topk_grad` [#72499](https://github.com/PaddlePaddle/Paddle/pull/72499)
+- `argsort` [#72388](https://github.com/PaddlePaddle/Paddle/pull/72388)
+- `round`, `mish`, `elu`, `selu`, `celu`, `stanh`, `softplus`, `softshrink`, `thresholded_relu`, `logit`, `nonzero` [#72312](https://github.com/PaddlePaddle/Paddle/pull/72312)
+- `unique ops` [#72824](https://github.com/PaddlePaddle/Paddle/pull/72824)
+- `put_along_axis` [#72766](https://github.com/PaddlePaddle/Paddle/pull/72766)
+- `round_grad`, `trunc_grad`, `ceil_grad`, `floor_grad`, `poisson_grad` [#72677](https://github.com/PaddlePaddle/Paddle/pull/72677)
+- `log_softmax`, `cummax`, `cummin` [#72720](https://github.com/PaddlePaddle/Paddle/pull/72720)
+- `unary` [#72177](https://github.com/PaddlePaddle/Paddle/pull/72177)
+- `unary_grad` [#72260](https://github.com/PaddlePaddle/Paddle/pull/72260)
+- `index_select`, `index_select_grad` [#72727](https://github.com/PaddlePaddle/Paddle/pull/72727)
+- `roll`, `roll_grad` [#72740](https://github.com/PaddlePaddle/Paddle/pull/72740)
+- `empty_like` [#73169](https://github.com/PaddlePaddle/Paddle/pull/73169)
+- `roi_align`, `roi_align_grad` [#72925](https://github.com/PaddlePaddle/Paddle/pull/72925)
+- `expand_as`, `expand_as_grad` [#73107](https://github.com/PaddlePaddle/Paddle/pull/73107)
+- `fused_gemm_epilogur` [#73126](https://github.com/PaddlePaddle/Paddle/pull/73126)
+- `label_smooth`, `label_smooth` [#72845](https://github.com/PaddlePaddle/Paddle/pull/72845)
+- `group_norm`, `group_norm_grad` [#72946](https://github.com/PaddlePaddle/Paddle/pull/72946)
+- `instance_norm`, `instance_norm_grad` [#72938](https://github.com/PaddlePaddle/Paddle/pull/72938)
+- `batch_norm`, `sync_batch_norm` [#72918](https://github.com/PaddlePaddle/Paddle/pull/72918)
+- `reduce_any` [#73175](https://github.com/PaddlePaddle/Paddle/pull/73175)
+- `fused_gemm_epilogue_rule` [#73494](https://github.com/PaddlePaddle/Paddle/pull/73494)
+
+### Performance
+
+* Support for the tensor_fusion optimization strategy and overlap optimization strategy with grouped parallel slicing. [#72551](https://github.com/PaddlePaddle/Paddle/pull/72551), [#72902](https://github.com/PaddlePaddle/Paddle/pull/72902), [#73142](https://github.com/PaddlePaddle/Paddle/pull/73142), [#71785](https://github.com/PaddlePaddle/Paddle/pull/71785)
+* Optimize the reshard module to reduce communication overhead. [#71969](https://github.com/PaddlePaddle/Paddle/pull/71969), [#73024](https://github.com/PaddlePaddle/Paddle/pull/73024), [#71868](https://github.com/PaddlePaddle/Paddle/pull/71868)
+* Optimize the slicing derivation rule for multiply to reduce communication overhead. [#73408](https://github.com/PaddlePaddle/Paddle/pull/73408)
+* Optimize the reverse communication when the distributed partition status is set to Partial, in order to reduce communication overhead. [#73236](https://github.com/PaddlePaddle/Paddle/pull/73236)
+* Communication fusion optimization during gradient update. [#72120](https://github.com/PaddlePaddle/Paddle/pull/72120) and [#72745](https://github.com/PaddlePaddle/Paddle/pull/72745)
+* Optimize the derivation of gelu slicing to reduce communication overhead. [#73279](https://github.com/PaddlePaddle/Paddle/pull/73279)
+* Optimize the slicing derivation rule of fused_rms_norm when there is a Partial state in the input, to reduce communication and computation overhead. [#73054](https://github.com/PaddlePaddle/Paddle/pull/73054)
-### Performance improvement
+### Bug fixes
-- Optimized the performance of the `where_double_grad` operator. [#70404](https://github.com/PaddlePaddle/Paddle/pull/70404)
-- Change "for range" to "slice" to speed up the execution of grad. [#69938](https://github.com/PaddlePaddle/Paddle/pull/69938)
+- Fixed the bug of communication hang in the virtual pipeline parallel strategy on H-card. [#71104](https://github.com/PaddlePaddle/Paddle/pull/71104), [#73470](https://github.com/PaddlePaddle/Paddle/pull/73470)
+- Fixed the bug in save/load. [#72023](https://github.com/PaddlePaddle/Paddle/pull/72023)
+- Fixed the bug that the linear_fused_grad_add strategy did not work in dynamic graph mode. [#72708](https://github.com/PaddlePaddle/Paddle/pull/72708)
+- Fixed the issues of the fused_rms_norm operator not running and accuracy bugs. [#72663](https://github.com/PaddlePaddle/Paddle/pull/72663)
+- Fixed the bug in the derivation rule for the expand operator segmentation. [#73154](https://github.com/PaddlePaddle/Paddle/pull/73154)
-## 6. Framework performance optimization
+### Others
-PRs related to performance optimization, encompassing optimizing operator performance, enhancing kernel performance, optimizing memory usage, and refining namespaces, all aim to provide users with a superior development experience.
+- Clean up dead code to facilitate code maintenance. [#71814](https://github.com/PaddlePaddle/Paddle/pull/71814), [#72538](https://github.com/PaddlePaddle/Paddle/pull/72538)
+- Added a new API, `local_map`, to pass distributed tensors to functions written for ordinary tensors. ([#71804](https://github.com/PaddlePaddle/Paddle/pull/71804))
+- Add checks for operator fused_linear_param_grad_add. ([#72483](https://github.com/PaddlePaddle/Paddle/pull/72483))
-### New Features
+## 5. Operator Mechanism
-- Enhanced support for fp8 type. [#64735](https://github.com/PaddlePaddle/Paddle/pull/64735), [#64955](https://github.com/PaddlePaddle/Paddle/pull/64955)
-- Enhanced support for XPU. [#65362](https://github.com/PaddlePaddle/Paddle/pull/65362), [#65304](https://github.com/PaddlePaddle/Paddle/pull/65304), [#68451](https://github.com/PaddlePaddle/Paddle/pull/68451)
-- Enhanced support for DCU. [#65398](https://github.com/PaddlePaddle/Paddle/pull/65398), [#65857](https://github.com/PaddlePaddle/Paddle/pull/65857), [#66423](https://github.com/PaddlePaddle/Paddle/pull/66423)
-- Expand the capabilities of oneDNN. [#66000](https://github.com/PaddlePaddle/Paddle/pull/66000), [#66474](https://github.com/PaddlePaddle/Paddle/pull/66474), [#66568](https://github.com/PaddlePaddle/Paddle/pull/66568)
-- Rename parameters and support more complex masks. [#65409](https://github.com/PaddlePaddle/Paddle/pull/65409)
-- Support for flash-attention. [#68968](https://github.com/PaddlePaddle/Paddle/pull/68968)
-- Support OpenVINO CPU high-performance inference. [#69122](https://github.com/PaddlePaddle/Paddle/pull/69122)
-
-### Functional improvements
+### New Features
-- Enhance PIR pass to achieve better fusion. [#65540](https://github.com/PaddlePaddle/Paddle/pull/65540)
-- Enhanced OneDNN functionality. [#65971](https://github.com/PaddlePaddle/Paddle/pull/65971), [#70430](https://github.com/PaddlePaddle/Paddle/pull/70430), [#70630](https://github.com/PaddlePaddle/Paddle/pull/70630), [#70871](https://github.com/PaddlePaddle/Paddle/pull/70871)
-- Improve the performance of FlashMask. [#68109](https://github.com/PaddlePaddle/Paddle/pull/68109)
-- Optimize kernel performance. [#69660](https://github.com/PaddlePaddle/Paddle/pull/69660), [#69596](https://github.com/PaddlePaddle/Paddle/pull/69596)
-- Combinatorial operator optimization. [#69515](https://github.com/PaddlePaddle/Paddle/pull/69515), [#69616](https://github.com/PaddlePaddle/Paddle/pull/69616)
+- Gradient and automatic differentiation optimization: Initially supports dual gradient computation for put_along_axis and repeat_interleave operations, improves the numerical stability of complex operators in automatic differentiation scenarios, and implements operator decomposition for masked_fill operations. [#72789](https://github.com/PaddlePaddle/Paddle/pull/72789), [#73056](https://github.com/PaddlePaddle/Paddle/pull/73056), [#73225](https://github.com/PaddlePaddle/Paddle/pull/73225)
+- Operator mechanism extension: Added custom support for __radd__ and __rmul__, enhancing the framework's ability to overload asymmetric operators. [#73119](https://github.com/PaddlePaddle/Paddle/pull/73119)
+- FP8 Module Support and Operator Development: Added support for FP8 block quantization GEMM, introduced multiple fused operators, and provided efficient operator-level implementations for Mixture of Experts (MoE) models, enhancing training and inference performance. [#73228](https://github.com/PaddlePaddle/Paddle/pull/73228), [#73285](https://github.com/PaddlePaddle/Paddle/pull/73285), [#73133](https://github.com/PaddlePaddle/Paddle/pull/73133), [#73364](https://github.com/PaddlePaddle/Paddle/pull/73364), [#73520](https://github.com/PaddlePaddle/Paddle/pull/73520), [#73531](https://github.com/PaddlePaddle/Paddle/pull/73531)
### Bug Fixes
-- Fixed bugs related to PIR, CINN, SOT, OneDNN, etc. [#68951](https://github.com/PaddlePaddle/Paddle/pull/68951), [#69553](https://github.com/PaddlePaddle/Paddle/pull/69553), [#69682](https://github.com/PaddlePaddle/Paddle/pull/69682), [#67741](https://github.com/PaddlePaddle/Paddle/pull/67741), [#69346](https://github.com/PaddlePaddle/Paddle/pull/69346), [#69401](https://github.com/PaddlePaddle/Paddle/pull/69401), [#68903](https://github.com/PaddlePaddle/Paddle/pull/68903)
-- Fixed bugs related to composite operators. [#69479](https://github.com/PaddlePaddle/Paddle/pull/69479), [#69487](https://github.com/PaddlePaddle/Paddle/pull/69487), [#67176](https://github.com/PaddlePaddle/Paddle/pull/67176)
-- Fixed the issue with the FP8 data type on the CPU. [#65539](https://github.com/PaddlePaddle/Paddle/pull/65539)
-- Remove unnecessary overhead for creating events in computational flow. [#67315](https://github.com/PaddlePaddle/Paddle/pull/67247)
-- Fixed performance issues. [#68378](https://github.com/PaddlePaddle/Paddle/pull/68378)
-- Fixed issues related to types. [#69720](https://github.com/PaddlePaddle/Paddle/pull/69720)
-- Fixed other issues. [#70019](https://github.com/PaddlePaddle/Paddle/pull/70019), [#70008](https://github.com/PaddlePaddle/Paddle/pull/70008), [#70645](https://github.com/PaddlePaddle/Paddle/pull/70645), [#71209](https://github.com/PaddlePaddle/Paddle/pull/71209), [#68152](https://github.com/PaddlePaddle/Paddle/pull/68152), [#69907](https://github.com/PaddlePaddle/Paddle/pull/69907), [#71207](https://github.com/PaddlePaddle/Paddle/pull/71207)
-
-### Performance optimization
-
-- Optimizations related to the CINN compiler. [#69455](https://github.com/PaddlePaddle/Paddle/pull/69455), [#70284](https://github.com/PaddlePaddle/Paddle/pull/70284), [#67576](https://github.com/PaddlePaddle/Paddle/pull/67576), [#68946](https://github.com/PaddlePaddle/Paddle/pull/68946), [#68615](https://github.com/PaddlePaddle/Paddle/pull/68615)
-- Optimizations related to oneDNN. [#68784](https://github.com/PaddlePaddle/Paddle/pull/68784), [#68716](https://github.com/PaddlePaddle/Paddle/pull/68716), [#67554](https://github.com/PaddlePaddle/Paddle/pull/67554)
-- Memory-related optimizations. [#68660](https://github.com/PaddlePaddle/Paddle/pull/68571), [#69930](https://github.com/PaddlePaddle/Paddle/pull/69930), [#68174](https://github.com/PaddlePaddle/Paddle/pull/68174), [#68660](https://github.com/PaddlePaddle/Paddle/pull/68571), [#70359](https://github.com/PaddlePaddle/Paddle/pull/70359)
-- Kernel computation-related optimizations. [#65507](https://github.com/PaddlePaddle/Paddle/pull/65507), [#68541](https://github.com/PaddlePaddle/Paddle/pull/68541), [#71479](https://github.com/PaddlePaddle/Paddle/pull/71479), [#71403](https://github.com/PaddlePaddle/Paddle/pull/71403)
-- XPU-related optimizations. [#67051](https://github.com/PaddlePaddle/Paddle/pull/67051)
-- Other optimizations include pass optimization of the inference process, dynamic shape optimization in automatic parallelism, and FlashAttention computation optimization. [#68394](https://github.com/PaddlePaddle/Paddle/pull/68394), [#68696](https://github.com/PaddlePaddle/Paddle/pull/68696), [#68759](https://github.com/PaddlePaddle/Paddle/pull/68759), [#68791](https://github.com/PaddlePaddle/Paddle/pull/68791), [#69390](https://github.com/PaddlePaddle/Paddle/pull/69390), [#69961](https://github.com/PaddlePaddle/Paddle/pull/69961), [#69939](https://github.com/PaddlePaddle/Paddle/pull/69939), [#70455](https://github.com/PaddlePaddle/Paddle/pull/70455), [#70663](https://github.com/PaddlePaddle/Paddle/pull/70663), [#71290](https://github.com/PaddlePaddle/Paddle/pull/71123)
-
-### Others
-
-- Modify function namespaces. [#66818](https://github.com/PaddlePaddle/Paddle/pull/66818), [#67023](https://github.com/PaddlePaddle/Paddle/pull/67023), [#67114](https://github.com/PaddlePaddle/Paddle/pull/67114), [#67217](https://github.com/PaddlePaddle/Paddle/pull/67217), [#67524](https://github.com/PaddlePaddle/Paddle/pull/67524), [#67796](https://github.com/PaddlePaddle/Paddle/pull/67796), [#67881](https://github.com/PaddlePaddle/Paddle/pull/67881)
- Upgrade OneDNN. [#69917](https://github.com/PaddlePaddle/Paddle/pull/69917)
-- Modify the pass level. [#69524](https://github.com/PaddlePaddle/Paddle/pull/69524)
-- Optimizations related to memory read and write. [#65804](https://github.com/PaddlePaddle/Paddle/pull/65804), [#66923](https://github.com/PaddlePaddle/Paddle/pull/66923)
-- Optimize the GetValueName-related signatures. [#66363](https://github.com/PaddlePaddle/Paddle/pull/66363), [#66559](https://github.com/PaddlePaddle/Paddle/pull/66559), [#66738](https://github.com/PaddlePaddle/Paddle/pull/66738)
+- Gradient and automatic differentiation stability improvement: Fixed some errors in the calculation of inverse operator gradients, enhancing numerical stability and functional correctness in automatic differentiation scenarios. [#71716](https://github.com/PaddlePaddle/Paddle/pull/71716), [#72299](https://github.com/PaddlePaddle/Paddle/pull/72299), [#72358](https://github.com/PaddlePaddle/Paddle/pull/72358), [#73037](https://github.com/PaddlePaddle/Paddle/pull/73037), [#73140](https://github.com/PaddlePaddle/Paddle/pull/73140), [#73185](https://github.com/PaddlePaddle/Paddle/pull/73185)
+- Numerical accuracy and overflow protection: Addresses issues such as numerical overflow, loss of precision, and large tensor overflow, ensuring the reliability of low-precision computations and large tensor operations. [#72584](https://github.com/PaddlePaddle/Paddle/pull/72584), [#72608](https://github.com/PaddlePaddle/Paddle/pull/72608), [#72681](https://github.com/PaddlePaddle/Paddle/pull/72681), [#72639](https://github.com/PaddlePaddle/Paddle/pull/72639), [#73245](https://github.com/PaddlePaddle/Paddle/pull/73245), [#73359](https://github.com/PaddlePaddle/Paddle/pull/73359), [#72456](https://github.com/PaddlePaddle/Paddle/pull/72456)
+- Operator logic and framework alignment: Align operator operation logic, fix issues such as abnormal operator inputs, and other important fixes: add checks to ensure the correctness of framework functionality. [#72282](https://github.com/PaddlePaddle/Paddle/pull/72282), [#71863](https://github.com/PaddlePaddle/Paddle/pull/71863), [#72650](https://github.com/PaddlePaddle/Paddle/pull/72650), [#72843](https://github.com/PaddlePaddle/Paddle/pull/72843), [#73070](https://github.com/PaddlePaddle/Paddle/pull/73070), [#73141](https://github.com/PaddlePaddle/Paddle/pull/73141), [#73203](https://github.com/PaddlePaddle/Paddle/pull/73203), [#73350](https://github.com/PaddlePaddle/Paddle/pull/73350), [#73440](https://github.com/PaddlePaddle/Paddle/pull/73440), [#73539](https://github.com/PaddlePaddle/Paddle/pull/73539), [#73339](https://github.com/PaddlePaddle/Paddle/pull/73339)
+- CUDA kernel and hardware adaptation optimization: Supports NVIDIA SM90 architecture, fixes issues such as overflow, removes redundant CUDA error checks, and enhances GPU computing efficiency and adaptability to new hardware. [#72507](https://github.com/PaddlePaddle/Paddle/pull/72507), [#72849](https://github.com/PaddlePaddle/Paddle/pull/72849), [#72959](https://github.com/PaddlePaddle/Paddle/pull/72959), [#73130](https://github.com/PaddlePaddle/Paddle/pull/73130), [#73489](https://github.com/PaddlePaddle/Paddle/pull/73489)
-### Discarded
+### Improvements
-- Remove obsolete files and functions. [#67514](https://github.com/PaddlePaddle/Paddle/pull/67514), [#67811](https://github.com/PaddlePaddle/Paddle/pull/67811), [#67911](https://github.com/PaddlePaddle/Paddle/pull/67911)
+- Added an implementation of fast division and modulo operation for the int64_t version, improving computational performance and numerical stability in large integer scenarios, [#72530](https://github.com/PaddlePaddle/Paddle/pull/72530)
+- Optimize the kernel with stride tensor copy to improve the efficiency of data copy under non-continuous memory layout. [#72662](https://github.com/PaddlePaddle/Paddle/pull/72662)
-## 7. Inferential deployment
+-Unify the usage of quantization API in dynamic and static graph modes, simplifying the development process of quantization models, [#73100](https://github.com/PaddlePaddle/Paddle/pull/73100)
-Focusing on two core directions: **the construction of the new generation of Proven Intermediate Representation (PIR) ecosystem** and **large model inference optimization**, the main breakthroughs include:
+### Performance
-1. **Deep fusion of PIR-TensorRT**
+- Optimize the decomposition performance of the gelu operator to enhance computational efficiency. [#72812](https://github.com/PaddlePaddle/Paddle/pull/72812)
-- Complete the refactoring and code optimization of the core execution mechanism, and develop over 50 operator converters
-- Added low-precision support (FP16/INT8) and Generic Plugin execution capability
-- Build a complete unit testing system that supports the entire process of model loading/saving
-
-2. **Leap in reasoning performance of large models**
-
-- Added full-process support for the Mixture of Experts (MoE) system, covering Hopper architecture optimization
-- Supports processing of 128K ultra-long sequences, enhancing long text reasoning capabilities
-- Implement cutting-edge quantization schemes such as FP8/W8A8 to reduce memory usage
+### Others
-3. **Comprehensive upgrade of infrastructure**
+- Fluid operator normalization and exit, [#71789](https://github.com/PaddlePaddle/Paddle/pull/71789), [#71818](https://github.com/PaddlePaddle/Paddle/pull/71818), [#71808](https://github.com/PaddlePaddle/Paddle/pull/71808), [#71860](https://github.com/PaddlePaddle/Paddle/pull/71860), [#71806](https://github.com/PaddlePaddle/Paddle/pull/71806), [#72011](https://github.com/PaddlePaddle/Paddle/pull/72011), [#72043](https://github.com/PaddlePaddle/Paddle/pull/72043), [#72034](https://github.com/PaddlePaddle/Paddle/pull/72034), [#72047](https://github.com/PaddlePaddle/Paddle/pull/72047), [#72056](https://github.com/PaddlePaddle/Paddle/pull/72056), [#72087](https://github.com/PaddlePaddle/Paddle/pull/72087), [#72086](https://github.com/PaddlePaddle/Paddle/pull/72086), [#72083](https://github.com/PaddlePaddle/Paddle/pull/72083), [#72079](https://github.com/PaddlePaddle/Paddle/pull/72079), [#72078](https://github.com/PaddlePaddle/Paddle/pull/72078), [#72076](https://github.com/PaddlePaddle/Paddle/pull/72076), [#72057](https://github.com/PaddlePaddle/Paddle/pull/72057), [#72077](https://github.com/PaddlePaddle/Paddle/pull/72077), [#72096](https://github.com/PaddlePaddle/Paddle/pull/72096), [#72085](https://github.com/PaddlePaddle/Paddle/pull/72085), [#72092](https://github.com/PaddlePaddle/Paddle/pull/72092), [#72110](https://github.com/PaddlePaddle/Paddle/pull/72110), [#72127](https://github.com/PaddlePaddle/Paddle/pull/72127), [#72111](https://github.com/PaddlePaddle/Paddle/pull/72111), [#72126](https://github.com/PaddlePaddle/Paddle/pull/72126), [#72135](https://github.com/PaddlePaddle/Paddle/pull/72135), [#72112](https://github.com/PaddlePaddle/Paddle/pull/72112), [#72131](https://github.com/PaddlePaddle/Paddle/pull/72131), [#70358](https://github.com/PaddlePaddle/Paddle/pull/70358), [#72125](https://github.com/PaddlePaddle/Paddle/pull/72125), [#72171](https://github.com/PaddlePaddle/Paddle/pull/72171), [#72160](https://github.com/PaddlePaddle/Paddle/pull/72160), [#72188](https://github.com/PaddlePaddle/Paddle/pull/72188), [#72197](https://github.com/PaddlePaddle/Paddle/pull/72197)
-- OneDNN has been upgraded to version 3.6, significantly enhancing CPU inference performance
-- Model loading speed optimized by over 40%, supporting fast loading of PIR models
-- Improve distributed inference support and fix allreduce data type issues
+## 6. Performance
### New Features
-- Support Paddle-TensorRT based on PaddlePaddle's new generation of intermediate representation (PIR)
-- Development of core basic execution mechanism functions and code optimization. [#64995](https://github.com/PaddlePaddle/Paddle/pull/64995), [#67054](https://github.com/PaddlePaddle/Paddle/pull/67054), [#67660](https://github.com/PaddlePaddle/Paddle/pull/67660), [#67755](https://github.com/PaddlePaddle/Paddle/pull/67755), [#70762](https://github.com/PaddlePaddle/Paddle/pull/70762),
-- Development of operator Marker and Converter. [#67753](https://github.com/PaddlePaddle/Paddle/pull/67753),[#67956](https://github.com/PaddlePaddle/Paddle/pull/67956),[#68084](https://github.com/PaddlePaddle/Paddle/pull/68084),[#67974](https://github.com/PaddlePaddle/Paddle/pull/67974),[#68395](https://github.com/PaddlePaddle/Paddle/pull/68395),[#68216](https://github.com/PaddlePaddle/Paddle/pull/68216),[#68529](https://github.com/PaddlePaddle/Paddle/pull/68529),[#68608](https://github.com/PaddlePaddle/Paddle/pull/68608), [#68663](https://github.com/PaddlePaddle/Paddle/pull/68663),[#68757](https://github.com/PaddlePaddle/Paddle/pull/68757),[#68614](https://github.com/PaddlePaddle/Paddle/pull/68614),[#68783](https://github.com/PaddlePaddle/Paddle/pull/68783),[#68775](https://github.com/PaddlePaddle/Paddle/pull/68775),[#68839](https://github.com/PaddlePaddle/Paddle/pull/68839),[#68686](https://github.com/PaddlePaddle/Paddle/pull/68686),[#68840](https://github.com/PaddlePaddle/Paddle/pull/68840),[#68941](https://github.com/PaddlePaddle/Paddle/pull/68941),[#69015](https://github.com/PaddlePaddle/Paddle/pull/69015),[#69038](https://github.com/PaddlePaddle/Paddle/pull/69038),[#69117](https://github.com/PaddlePaddle/Paddle/pull/69117),[#69208](https://github.com/PaddlePaddle/Paddle/pull/69208),[#69315](https://github.com/PaddlePaddle/Paddle/pull/69315),[#69261](https://github.com/PaddlePaddle/Paddle/pull/69261),[#68878](https://github.com/PaddlePaddle/Paddle/pull/68878),[#69705](https://github.com/PaddlePaddle/Paddle/pull/69705),[#69706](https://github.com/PaddlePaddle/Paddle/pull/69706),[#70170](https://github.com/PaddlePaddle/Paddle/pull/70170),[#70267](https://github.com/PaddlePaddle/Paddle/pull/70267),[#70429](https://github.com/PaddlePaddle/Paddle/pull/70429),[#69330](https://github.com/PaddlePaddle/Paddle/pull/69330),[#70507](https://github.com/PaddlePaddle/Paddle/pull/70507),[#70535](https://github.com/PaddlePaddle/Paddle/pull/70535),[#70667](https://github.com/PaddlePaddle/Paddle/pull/70667),[#70816](https://github.com/PaddlePaddle/Paddle/pull/70816),[#70826](https://github.com/PaddlePaddle/Paddle/pull/70826),[#70955](https://github.com/PaddlePaddle/Paddle/pull/70955),[#71028](https://github.com/PaddlePaddle/Paddle/pull/71028),[#71013](https://github.com/PaddlePaddle/Paddle/pull/71013),[#71157](https://github.com/PaddlePaddle/Paddle/pull/71157),[#71231](https://github.com/PaddlePaddle/Paddle/pull/71231),[#69199](https://github.com/PaddlePaddle/Paddle/pull/69199),[#68956](https://github.com/PaddlePaddle/Paddle/pull/68956),[#66658](https://github.com/PaddlePaddle/Paddle/pull/66658),[#66811](https://github.com/PaddlePaddle/Paddle/pull/66811),[#67519](https://github.com/PaddlePaddle/Paddle/pull/67519),[#67877](https://github.com/PaddlePaddle/Paddle/pull/67877),[#68090](https://github.com/PaddlePaddle/Paddle/pull/68090),[#69086](https://github.com/PaddlePaddle/Paddle/pull/69086),[#68787](https://github.com/PaddlePaddle/Paddle/pull/68787),[#68778](https://github.com/PaddlePaddle/Paddle/pull/68778),[#69318](https://github.com/PaddlePaddle/Paddle/pull/69318),[#69995](https://github.com/PaddlePaddle/Paddle/pull/69995),[#70325](https://github.com/PaddlePaddle/Paddle/pull/70325),[#70817](https://github.com/PaddlePaddle/Paddle/pull/70817),[#70879](https://github.com/PaddlePaddle/Paddle/pull/70879),[#70875](https://github.com/PaddlePaddle/Paddle/pull/70875),[#71041](https://github.com/PaddlePaddle/Paddle/pull/71041),[#68876](https://github.com/PaddlePaddle/Paddle/pull/68876)
-- Support for Generic Plugin execution function. [#66634](https://github.com/PaddlePaddle/Paddle/pull/66634), [#70251](https://github.com/PaddlePaddle/Paddle/pull/70251)
-- Low-precision (FP16, INT8) function support. [#69597](https://github.com/PaddlePaddle/Paddle/pull/69597), [#71127](https://github.com/PaddlePaddle/Paddle/pull/71127),
-- Auxiliary functions such as the single test system and pass usage support have been improved [#67525](https://github.com/PaddlePaddle/Paddle/pull/67525), [#68034](https://github.com/PaddlePaddle/Paddle/pull/68034), [#71281](https://github.com/PaddlePaddle/Paddle/pull/71281), [#71235](https://github.com/PaddlePaddle/Paddle/pull/71235), [#67568](https://github.com/PaddlePaddle/Paddle/pull/67568), [#70139](https://github.com/PaddlePaddle/Paddle/pull/70139), [#70529](https://github.com/PaddlePaddle/Paddle/pull/70529)
-- Large model inference optimization
-- Added fused_moe function support (basic support/non-standard TopK/Hopper architecture) [#66084](https://github.com/PaddlePaddle/Paddle/pull/66084), [#67425](https://github.com/PaddlePaddle/Paddle/pull/67425), [#67732](https://github.com/PaddlePaddle/Paddle/pull/67732)
-- Support for mixed precision computation (GQA mixed precision/BF16 registration) [#65078](https://github.com/PaddlePaddle/Paddle/pull/65078), [#67769](https://github.com/PaddlePaddle/Paddle/pull/67769)
-- Added inference optimization features (dynamic graph inference/support for 128K long sequences) [#65962](https://github.com/PaddlePaddle/Paddle/pull/65962), [#70088](https://github.com/PaddlePaddle/Paddle/pull/70088)
-- Added implementation of quantization inference operator (FP8 W8A8 computation/weight-only int4 quantization) [#65441](https://github.com/PaddlePaddle/Paddle/pull/65441), [#64094](https://github.com/PaddlePaddle/Paddle/pull/64094)
-
-### Feature-complete
-
-- The functional mechanism of Inference is well-established under PIR
-- The executor supports loading .json models [#65223](https://github.com/PaddlePaddle/Paddle/pull/65223)
-- Support controllable PIR mode switch-on/off [#65596](https://github.com/PaddlePaddle/Paddle/pull/65596)
-- Improved reasoning mechanism of large models
-- Optimized gemm algorithm search (cublaslt global search/offline caching) [#65597](https://github.com/PaddlePaddle/Paddle/pull/65597), [#66132](https://github.com/PaddlePaddle/Paddle/pull/66132)
-- Enhance type system compatibility (PD_VISIT_FLOATING_AND_HALF_TYPES) [#71022](https://github.com/PaddlePaddle/Paddle/pull/71022)
-- Optimized attention mechanism (support for multiple blocks of MMHA/XPU) [#67211](https://github.com/PaddlePaddle/Paddle/pull/67211), [#68104](https://github.com/PaddlePaddle/Paddle/pull/68104)
-
-### Performance optimization
-
-- OneDNN has been upgraded to version 3.6, resulting in a general improvement in model inference performance on GNR/EMR devices [#69386](https://github.com/PaddlePaddle/Paddle/pull/69386)
-- Operator performance optimization (layer_norm/top_p_sampling) [#65711](https://github.com/PaddlePaddle/Paddle/pull/65711)
-- Model loading acceleration (regular/PIR model) [#69110](https://github.com/PaddlePaddle/Paddle/pull/69110), [#70219](https://github.com/PaddlePaddle/Paddle/pull/70219)
+The `acc_steps` of `sharding_overlap` is configurable. [#72395](https://github.com/PaddlePaddle/Paddle/pull/72395)
### Bug fixes
-- Fixed issues related to Predictor when saving/loading PIR models. [#65180](https://github.com/PaddlePaddle/Paddle/pull/65180), [#65019](https://github.com/PaddlePaddle/Paddle/pull/65019), [#65714](https://github.com/PaddlePaddle/Paddle/pull/65714), [#69619](https://github.com/PaddlePaddle/Paddle/pull/69619), [#67570](https://github.com/PaddlePaddle/Paddle/pull/67570), [#65595](https://github.com/PaddlePaddle/Paddle/pull/65595), [#69200](https://github.com/PaddlePaddle/Paddle/pull/69200)
-- Fixed execution issues of reasoning unit tests in scenarios such as PIR and multiple hardware configurations. [#65763](https://github.com/PaddlePaddle/Paddle/pull/65763),[#66481](https://github.com/PaddlePaddle/Paddle/pull/66481),[#67105](https://github.com/PaddlePaddle/Paddle/pull/67105),[#67248](https://github.com/PaddlePaddle/Paddle/pull/67248),[#67470](https://github.com/PaddlePaddle/Paddle/pull/67470),[#67638](https://github.com/PaddlePaddle/Paddle/pull/67638),[#68135](https://github.com/PaddlePaddle/Paddle/pull/68135),[#68191](https://github.com/PaddlePaddle/Paddle/pull/68191),[#68211](https://github.com/PaddlePaddle/Paddle/pull/68211),[#68160](https://github.com/PaddlePaddle/Paddle/pull/68160),[#68185](https://github.com/PaddlePaddle/Paddle/pull/68185),[#68127](https://github.com/PaddlePaddle/Paddle/pull/68127),[#68887](https://github.com/PaddlePaddle/Paddle/pull/68887),[#69191](https://github.com/PaddlePaddle/Paddle/pull/69191), [#70961](https://github.com/PaddlePaddle/Paddle/pull/70961),[#68020](https://github.com/PaddlePaddle/Paddle/pull/68020),[#67923](https://github.com/PaddlePaddle/Paddle/pull/67923),[#67963](https://github.com/PaddlePaddle/Paddle/pull/67963),[#68482](https://github.com/PaddlePaddle/Paddle/pull/68482),[#68546](https://github.com/PaddlePaddle/Paddle/pull/68546),[#68593](https://github.com/PaddlePaddle/Paddle/pull/68593),[#68793](https://github.com/PaddlePaddle/Paddle/pull/68793)
-- Fixed issues related to Paddle TensorRT conversion and execution. [#66932](https://github.com/PaddlePaddle/Paddle/pull/66932),[#66655](https://github.com/PaddlePaddle/Paddle/pull/66655),[#67274](https://github.com/PaddlePaddle/Paddle/pull/67274),[#67504](https://github.com/PaddlePaddle/Paddle/pull/67504),[#65780](https://github.com/PaddlePaddle/Paddle/pull/65780),[#68170](https://github.com/PaddlePaddle/Paddle/pull/68170),[#68647](https://github.com/PaddlePaddle/Paddle/pull/68647),[#68776](https://github.com/PaddlePaddle/Paddle/pull/68776),[#69573](https://github.com/PaddlePaddle/Paddle/pull/69573),[#69598](https://github.com/PaddlePaddle/Paddle/pull/69598),[#69510](https://github.com/PaddlePaddle/Paddle/pull/69510),[#69864](https://github.com/PaddlePaddle/Paddle/pull/69864),[#69885](https://github.com/PaddlePaddle/Paddle/pull/69885),[#70161](https://github.com/PaddlePaddle/Paddle/pull/70161),[#70116](https://github.com/PaddlePaddle/Paddle/pull/70116),[#70791](https://github.com/PaddlePaddle/Paddle/pull/70791),[#70801](https://github.com/PaddlePaddle/Paddle/pull/70801),[#70824](https://github.com/PaddlePaddle/Paddle/pull/70824),[#70939](https://github.com/PaddlePaddle/Paddle/pull/70939), [#71143](https://github.com/PaddlePaddle/Paddle/pull/71143),[#71154](https://github.com/PaddlePaddle/Paddle/pull/71154),[#71163](https://github.com/PaddlePaddle/Paddle/pull/71163),[#71183](https://github.com/PaddlePaddle/Paddle/pull/71183),[#71233](https://github.com/PaddlePaddle/Paddle/pull/71233),[#71287](https://github.com/PaddlePaddle/Paddle/pull/71287),[#71319](https://github.com/PaddlePaddle/Paddle/pull/71319),[#67720](https://github.com/PaddlePaddle/Paddle/pull/67720),[#69671](https://github.com/PaddlePaddle/Paddle/pull/69671),[#70168](https://github.com/PaddlePaddle/Paddle/pull/70168),[#69957](https://github.com/PaddlePaddle/Paddle/pull/69957)
-- Fixed issues related to Paddle Inference compilation and linking. [#65846](https://github.com/PaddlePaddle/Paddle/pull/65846), [#67081](https://github.com/PaddlePaddle/Paddle/pull/67081), [#63184](https://github.com/PaddlePaddle/Paddle/pull/63184)
-- Fixed quantization issues. [#67839](https://github.com/PaddlePaddle/Paddle/pull/67839), [#68049](https://github.com/PaddlePaddle/Paddle/pull/68049), [#70099](https://github.com/PaddlePaddle/Paddle/pull/70099), [#64878](https://github.com/PaddlePaddle/Paddle/pull/64878), [#65717](https://github.com/PaddlePaddle/Paddle/pull/65717), [#67552](https://github.com/PaddlePaddle/Paddle/pull/67552), [#67715](https://github.com/PaddlePaddle/Paddle/pull/67715)
-- Fixed OneDNN inference issues. [#67836](https://github.com/PaddlePaddle/Paddle/pull/67836), [#68021](https://github.com/PaddlePaddle/Paddle/pull/68021), [#68132](https://github.com/PaddlePaddle/Paddle/pull/68132), [#71426](https://github.com/PaddlePaddle/Paddle/pull/71426), [#68057](https://github.com/PaddlePaddle/Paddle/pull/68057)
-- Fixed memory issues. [#68631](https://github.com/PaddlePaddle/Paddle/pull/68631), [#69129](https://github.com/PaddlePaddle/Paddle/pull/69129), [#70314](https://github.com/PaddlePaddle/Paddle/pull/70314), [#67863](https://github.com/PaddlePaddle/Paddle/pull/67863)
-- Paddle Inference supports bug fixes for OpenVINO issues. [#70212](https://github.com/PaddlePaddle/Paddle/pull/70212), [#70288](https://github.com/PaddlePaddle/Paddle/pull/70288),
-- Fixed issues related to Pass. [#65349](https://github.com/PaddlePaddle/Paddle/pull/65349),[#65421](https://github.com/PaddlePaddle/Paddle/pull/65421),[#65677](https://github.com/PaddlePaddle/Paddle/pull/65677),[#66850](https://github.com/PaddlePaddle/Paddle/pull/66850),[#67443](https://github.com/PaddlePaddle/Paddle/pull/67443),[#67620](https://github.com/PaddlePaddle/Paddle/pull/67620),[#68158](https://github.com/PaddlePaddle/Paddle/pull/68158),[#68642](https://github.com/PaddlePaddle/Paddle/pull/68642),[#68837](https://github.com/PaddlePaddle/Paddle/pull/68837),[#68880](https://github.com/PaddlePaddle/Paddle/pull/68880),[#68935](https://github.com/PaddlePaddle/Paddle/pull/68935),[#69112](https://github.com/PaddlePaddle/Paddle/pull/69112),[#69205](https://github.com/PaddlePaddle/Paddle/pull/69205),[#69242](https://github.com/PaddlePaddle/Paddle/pull/69242),[#69352](https://github.com/PaddlePaddle/Paddle/pull/69352),[#69421](https://github.com/PaddlePaddle/Paddle/pull/69421),[#69690](https://github.com/PaddlePaddle/Paddle/pull/69690),
-- Fixed other issues. [#70237](https://github.com/PaddlePaddle/Paddle/pull/70237), [#68173](https://github.com/PaddlePaddle/Paddle/pull/68173)
-- Fixed issues related to fused_moe (testing/GEMM/WINT4/multi-architecture compatibility/Bias optional) [#67353](https://github.com/PaddlePaddle/Paddle/pull/67353), [#67396](https://github.com/PaddlePaddle/Paddle/pull/67396), [#67717](https://github.com/PaddlePaddle/Paddle/pull/67717), [#67794](https://github.com/PaddlePaddle/Paddle/pull/67794), [#67783](https://github.com/PaddlePaddle/Paddle/pull/67783)
-- Fixed issues in the block_attention series (GQA discrepancy/out-of-bounds risk/multi-head support) [#67175](https://github.com/PaddlePaddle/Paddle/pull/67175), [#69001](https://github.com/PaddlePaddle/Paddle/pull/69001), [#70763](https://github.com/PaddlePaddle/Paddle/pull/70763)
-- Fixed PIR-related issues (layout conversion/BF16 replacement errors) [#66977](https://github.com/PaddlePaddle/Paddle/pull/66977), [#67830](https://github.com/PaddlePaddle/Paddle/pull/67830)
-- Fixed distributed-related issues (allreduce data type/parameter synchronization) [#67449](https://github.com/PaddlePaddle/Paddle/pull/67449), [#69157](https://github.com/PaddlePaddle/Paddle/pull/69157)
-- Fixed kernel execution issues (forward-backward conflict/default stream argsort) [#67218](https://github.com/PaddlePaddle/Paddle/pull/67218), [#68374](https://github.com/PaddlePaddle/Paddle/pull/68374)
-- Other key fixes (reducing the size of the C++ library/fixing RoPE calculation in NeoX format/fixing static graph execution) [#66041](https://github.com/PaddlePaddle/Paddle/pull/66041), [#66583](https://github.com/PaddlePaddle/Paddle/pull/66583), [#67580](https://github.com/PaddlePaddle/Paddle/pull/67580)
+- Fixed the `inplace` issue of operator `c_softmax_with_cross_entropy_grad`. [#72366](https://github.com/PaddlePaddle/Paddle/pull/72366)
-### Other modifications
+### Improvements
-- Code cleanup and maintenance (API deprecation/compilation warning fixes) [#68048](https://github.com/PaddlePaddle/Paddle/pull/68048), [#70384](https://github.com/PaddlePaddle/Paddle/pull/70384)
-- Third-party integration optimization (OpenVINO submodule management) [#70313](https://github.com/PaddlePaddle/Paddle/pull/70313), [#70425](https://github.com/PaddlePaddle/Paddle/pull/70425)
+- Performance optimization and acceleration: Enabled cuDNN support for deep convolution, enhancing convolution operation efficiency. Updated pooling operation strategy and optimized permute memory operations to reduce CUDA memory usage. Optimized printing speed, accelerating debugging and log output processes. [#71796](https://github.com/PaddlePaddle/Paddle/pull/71796), [#73442](https://github.com/PaddlePaddle/Paddle/pull/73442), [#73563](https://github.com/PaddlePaddle/Paddle/pull/73563)
+- Feature Enhancements and Operational Support: Added the masked_fill operation and Boolean index optimization to enhance tensor masking processing capabilities. Implemented the index_elementwise operation to support index-based element-level operations. Added pooling and reshape execution strategies to enhance the flexibility of model operations. [#72788](https://github.com/PaddlePaddle/Paddle/pull/72788), [#72942](https://github.com/PaddlePaddle/Paddle/pull/72942)
+- Bug fixes and stability improvements: Fixed a partial state support issue with fused_rms_norm in SPMD parallel mode. Corrected index errors in output dimension calculation and IndexGetStride during the slice operation to ensure computational correctness. [#72118](https://github.com/PaddlePaddle/Paddle/pull/72118), [#72223](https://github.com/PaddlePaddle/Paddle/pull/72223), [#73184](https://github.com/PaddlePaddle/Paddle/pull/73184), [#73237](https://github.com/PaddlePaddle/Paddle/pull/73237), [#73054](https://github.com/PaddlePaddle/Paddle/pull/73054)
+- Faster Guard adaptation: Reduce SOT end-to-end link overhead. [#71900](https://github.com/PaddlePaddle/Paddle/pull/71900), [#71979](https://github.com/PaddlePaddle/Paddle/pull/71979), [#72081](https://github.com/PaddlePaddle/Paddle/pull/72081), [#72327](https://github.com/PaddlePaddle/Paddle/pull/72327), [#72564](https://github.com/PaddlePaddle/Paddle/pull/72564), [#72823](https://github.com/PaddlePaddle/Paddle/pull/72823)
+- Performance optimization and acceleration: Optimize operator scheduling strategy. Upgrade Flash Attention to version v3 to reduce computational overhead. Fix model performance bottlenecks and improve inference and training speed. [#71937](https://github.com/PaddlePaddle/Paddle/pull/71937), [#71828](https://github.com/PaddlePaddle/Paddle/pull/71828), [#71461](https://github.com/PaddlePaddle/Paddle/pull/71461), [#72039](https://github.com/PaddlePaddle/Paddle/pull/72039), [#72228](https://github.com/PaddlePaddle/Paddle/pull/72228), [#72225](https://github.com/PaddlePaddle/Paddle/pull/72225), [#72623](https://github.com/PaddlePaddle/Paddle/pull/72623), [#72666](https://github.com/PaddlePaddle/Paddle/pull/72666), [#73147](https://github.com/PaddlePaddle/Paddle/pull/73147), [#73393](https://github.com/PaddlePaddle/Paddle/pull/73393)
+- Parallel computing: Optimize the grid re-sharding strategy in automatic parallelism, integrate communication and optimize logic in the Sharding Stage, enhance the stability of distributed training, and reduce the communication overhead of distributed training. [#71969](https://github.com/PaddlePaddle/Paddle/pull/71969), [#72120](https://github.com/PaddlePaddle/Paddle/pull/72120), [#73279](https://github.com/PaddlePaddle/Paddle/pull/73279), [#73406](https://github.com/PaddlePaddle/Paddle/pull/73406)
-## 8. Hardware adaptation
+Feature enhancements and fixes: - Optimized operator indexing and kernel scheduling logic. [#72625](https://github.com/PaddlePaddle/Paddle/pull/72625), [#72741](https://github.com/PaddlePaddle/Paddle/pull/72741), [#73082](https://github.com/PaddlePaddle/Paddle/pull/73082), [#73501](https://github.com/PaddlePaddle/Paddle/pull/73501)
-Continuously improve and upgrade the functions of platforms such as Kunlun and Haiguang to enhance user experience
+- Model and operation support: Supports deep convolution in NHWC format, adapting to more hardware memory layouts. [#72121](https://github.com/PaddlePaddle/Paddle/pull/72121)
-### New Features
+## 7. Custom Device
-The addition of operations (ops) and improvement of functions on Kunlun Core XPU involve the following ops: flash attention/flash_attn_unpadded, multinomial, matmul, repeat_interleave, logsumexp, index_put_grad, mean_grad, pow, pow_grad, rsqrt, full, rms_norm, rms_norm_grad, put_along_axis, Cumsum, argmin, masked_select/grad, expand_v2/grad, all2all, expand, reduce_sum, reduce_max, reduce_min, moe, fused_linear_param_grad_add, adamw, clip/clip_grad, tan, acos, blha_get_max_len, gather/gather_grad, scatter/scatter_grad, round, index_select/sindex_select_grad, isfinite, isinf, quantize_linear, dequantize_linear, conv3d_transpose, logsumexp_grad, index_add_grad, eye, gather_element, tril, triu, set_value_grad, argmax, take_along_axis, etc
-[#65413](https://github.com/PaddlePaddle/Paddle/pull/65413), [#64846](https://github.com/PaddlePaddle/Paddle/pull/64846), [#65656](https://github.com/PaddlePaddle/Paddle/pull/65656), [#65963](https://github.com/PaddlePaddle/Paddle/pull/65963), [#66143](https://github.com/PaddlePaddle/Paddle/pull/66143), [#66482](https://github.com/PaddlePaddle/Paddle/pull/66482), [#66585](https://github.com/PaddlePaddle/Paddle/pull/66585), [#67077](https://github.com/PaddlePaddle/Paddle/pull/67077), [#67173](https://github.com/PaddlePaddle/Paddle/pull/67173), [#67551](https://github.com/PaddlePaddle/Paddle/pull/67551), [#63989](https://github.com/PaddlePaddle/Paddle/pull/63989), [#67919](https://github.com/PaddlePaddle/Paddle/pull/67919), [#68052](https://github.com/PaddlePaddle/Paddle/pull/68052), [#68176](https://github.com/PaddlePaddle/Paddle/pull/68176), [#68408](https://github.com/PaddlePaddle/Paddle/pull/68408), [#68454](https://github.com/PaddlePaddle/Paddle/pull/68454), [#68478](https://github.com/PaddlePaddle/Paddle/pull/68478), [#68473](https://github.com/PaddlePaddle/Paddle/pull/68473), [#68453](https://github.com/PaddlePaddle/Paddle/pull/68453), [#68770](https://github.com/PaddlePaddle/Paddle/pull/68770), [#68933](https://github.com/PaddlePaddle/Paddle/pull/68933), [#69042](https://github.com/PaddlePaddle/Paddle/pull/69042), [#68713](https://github.com/PaddlePaddle/Paddle/pull/68713), [#69368](https://github.com/PaddlePaddle/Paddle/pull/69368), [#69723](https://github.com/PaddlePaddle/Paddle/pull/69723), [#69767](https://github.com/PaddlePaddle/Paddle/pull/69767), [#69898](https://github.com/PaddlePaddle/Paddle/pull/69898), [#69970](https://github.com/PaddlePaddle/Paddle/pull/69970), [#69771](https://github.com/PaddlePaddle/Paddle/pull/69771), [#70176](https://github.com/PaddlePaddle/Paddle/pull/70176), [#70428](https://github.com/PaddlePaddle/Paddle/pull/70428), [#70573](https://github.com/PaddlePaddle/Paddle/pull/70573), [#70576](https://github.com/PaddlePaddle/Paddle/pull/70576), [#70633](https://github.com/PaddlePaddle/Paddle/pull/70633), [#70114](https://github.com/PaddlePaddle/Paddle/pull/70114), [#70627](https://github.com/PaddlePaddle/Paddle/pull/70627), [#71038](https://github.com/PaddlePaddle/Paddle/pull/71038), [#71132](https://github.com/PaddlePaddle/Paddle/pull/71132), [#71228](https://github.com/PaddlePaddle/Paddle/pull/71228), [#71274](https://github.com/PaddlePaddle/Paddle/pull/71274), [#71364](https://github.com/PaddlePaddle/Paddle/pull/71364), [#71375](https://github.com/PaddlePaddle/Paddle/pull/71375), [#71431](https://github.com/PaddlePaddle/Paddle/pull/71431), [#71451](https://github.com/PaddlePaddle/Paddle/pull/71451), [#67585](https://github.com/PaddlePaddle/Paddle/pull/67585), [#67637](https://github.com/PaddlePaddle/Paddle/pull/67637), [#67914](https://github.com/PaddlePaddle/Paddle/pull/67914), [#67641](https://github.com/PaddlePaddle/Paddle/pull/67641), [#67913](https://github.com/PaddlePaddle/Paddle/pull/67913), [#67955](https://github.com/PaddlePaddle/Paddle/pull/67955), [#68411](https://github.com/PaddlePaddle/Paddle/pull/68411), [#68560](https://github.com/PaddlePaddle/Paddle/pull/68560), [#68423](https://github.com/PaddlePaddle/Paddle/pull/68423), [#68894](https://github.com/PaddlePaddle/Paddle/pull/68894), [#71053](https://github.com/PaddlePaddle/Paddle/pull/71053), [#71047](https://github.com/PaddlePaddle/Paddle/pull/71047), [#69056](https://github.com/PaddlePaddle/Paddle/pull/69056), [#70843](https://github.com/PaddlePaddle/Paddle/pull/70843), [#65653](https://github.com/PaddlePaddle/Paddle/pull/65653), [#68023](https://github.com/PaddlePaddle/Paddle/pull/68023), [#67780](https://github.com/PaddlePaddle/Paddle/pull/67780), [#68622](https://github.com/PaddlePaddle/Paddle/pull/68622), [#67215](https://github.com/PaddlePaddle/Paddle/pull/67215)
+Optimize hardware mechanisms and provide a solution for reusing CUDA-like hardware kernels.
-Add support for rocsolver and warpctc on Haiguang DCU, and carry out the addition of OPs and improvement of functions. The involved ops include: flash_attention, hipblaslt, fastgelu, multiclass_nms3
-
-[#68066](https://github.com/PaddlePaddle/Paddle/pull/68066), [#69457](https://github.com/PaddlePaddle/Paddle/pull/69457), [#68603](https://github.com/PaddlePaddle/Paddle/pull/68603), [#65599](https://github.com/PaddlePaddle/Paddle/pull/65599), [#70587](https://github.com/PaddlePaddle/Paddle/pull/70587), [#71337](https://github.com/PaddlePaddle/Paddle/pull/71337), [#70173](https://github.com/PaddlePaddle/Paddle/pull/70173)
-
-### Bug fixes
-
-Bug fix for OP on Kunlun Core XPU
-[#65020](https://github.com/PaddlePaddle/Paddle/pull/65020), [#65251](https://github.com/PaddlePaddle/Paddle/pull/65251), [#65418](https://github.com/PaddlePaddle/Paddle/pull/65418), [#65387](https://github.com/PaddlePaddle/Paddle/pull/65387), [#65525](https://github.com/PaddlePaddle/Paddle/pull/65525), [#65613](https://github.com/PaddlePaddle/Paddle/pull/65613), [#65533](https://github.com/PaddlePaddle/Paddle/pull/65533), [#65705](https://github.com/PaddlePaddle/Paddle/pull/65705), [#65915](https://github.com/PaddlePaddle/Paddle/pull/65915), [#66238](https://github.com/PaddlePaddle/Paddle/pull/66238), [#66485](https://github.com/PaddlePaddle/Paddle/pull/66485), [#67349](https://github.com/PaddlePaddle/Paddle/pull/67349), [#67372](https://github.com/PaddlePaddle/Paddle/pull/67372), [#67276](https://github.com/PaddlePaddle/Paddle/pull/67276), [#67460](https://github.com/PaddlePaddle/Paddle/pull/67460), [#67496](https://github.com/PaddlePaddle/Paddle/pull/67496), [#67530](https://github.com/PaddlePaddle/Paddle/pull/67530), [#67828](https://github.com/PaddlePaddle/Paddle/pull/67828), [#68010](https://github.com/PaddlePaddle/Paddle/pull/68010), [#68157](https://github.com/PaddlePaddle/Paddle/pull/68157), [#68172](https://github.com/PaddlePaddle/Paddle/pull/68172), [#68388](https://github.com/PaddlePaddle/Paddle/pull/68388), [#68213](https://github.com/PaddlePaddle/Paddle/pull/68213), [#68501](https://github.com/PaddlePaddle/Paddle/pull/68501), [#68504](https://github.com/PaddlePaddle/Paddle/pull/68504), [#68585](https://github.com/PaddlePaddle/Paddle/pull/68585), [#69229](https://github.com/PaddlePaddle/Paddle/pull/69229), [#69374](https://github.com/PaddlePaddle/Paddle/pull/69374), [#69424](https://github.com/PaddlePaddle/Paddle/pull/69424), [#69440](https://github.com/PaddlePaddle/Paddle/pull/69440), [#69614](https://github.com/PaddlePaddle/Paddle/pull/69614), [#68542](https://github.com/PaddlePaddle/Paddle/pull/68542), [#69990](https://github.com/PaddlePaddle/Paddle/pull/69990), [#70351](https://github.com/PaddlePaddle/Paddle/pull/70351), [#70479](https://github.com/PaddlePaddle/Paddle/pull/70479), [#70431](https://github.com/PaddlePaddle/Paddle/pull/70431), [#70638](https://github.com/PaddlePaddle/Paddle/pull/70638), [#70856](https://github.com/PaddlePaddle/Paddle/pull/70856), [#70974](https://github.com/PaddlePaddle/Paddle/pull/70974), [#70973](https://github.com/PaddlePaddle/Paddle/pull/70973), [#71027](https://github.com/PaddlePaddle/Paddle/pull/71027), [#71062](https://github.com/PaddlePaddle/Paddle/pull/71062), [#71115](https://github.com/PaddlePaddle/Paddle/pull/71115), [#71110](https://github.com/PaddlePaddle/Paddle/pull/71110), [#70858](https://github.com/PaddlePaddle/Paddle/pull/70858), [#71147](https://github.com/PaddlePaddle/Paddle/pull/71147), [#71212](https://github.com/PaddlePaddle/Paddle/pull/71212), [#71361](https://github.com/PaddlePaddle/Paddle/pull/71361), [#71423](https://github.com/PaddlePaddle/Paddle/pull/71423), [#70859](https://github.com/PaddlePaddle/Paddle/pull/70859), [#71492](https://github.com/PaddlePaddle/Paddle/pull/71492), [#71493](https://github.com/PaddlePaddle/Paddle/pull/71493), [#69826](https://github.com/PaddlePaddle/Paddle/pull/69826), [#67341](https://github.com/PaddlePaddle/Paddle/pull/67341), [#68906](https://github.com/PaddlePaddle/Paddle/pull/68906), [#71171](https://github.com/PaddlePaddle/Paddle/pull/71171)
-
-Bug fix for OP on Haiguang DCU
-[#69617](https://github.com/PaddlePaddle/Paddle/pull/69617), [#65716](https://github.com/PaddlePaddle/Paddle/pull/65716), [#66630](https://github.com/PaddlePaddle/Paddle/pull/66630), [#65399](https://github.com/PaddlePaddle/Paddle/pull/65399)
-
-### Performance optimization
-
-Kunlun Core XPU upgrades the functions of basic components such as streams and optimizes the performance of certain operations.
-[#65102](https://github.com/PaddlePaddle/Paddle/pull/65102), [#69727](https://github.com/PaddlePaddle/Paddle/pull/69727), [#69899](https://github.com/PaddlePaddle/Paddle/pull/69899), [#69942](https://github.com/PaddlePaddle/Paddle/pull/69942), [#70025](https://github.com/PaddlePaddle/Paddle/pull/70025), [#70640](https://github.com/PaddlePaddle/Paddle/pull/70640)
-
-### Upgrade of hardware underlying basic libraries
-
-The upgrade of the basic library supports Kunlun Core P800, as well as the support for basic components
-[#65494](https://github.com/PaddlePaddle/Paddle/pull/65494), [#65924](https://github.com/PaddlePaddle/Paddle/pull/65924), [#69752](https://github.com/PaddlePaddle/Paddle/pull/69752), [#70835](https://github.com/PaddlePaddle/Paddle/pull/70835), [#65554](https://github.com/PaddlePaddle/Paddle/pull/65554), [#66998](https://github.com/PaddlePaddle/Paddle/pull/66998), [#65278](https://github.com/PaddlePaddle/Paddle/pull/65278), [#70614](https://github.com/PaddlePaddle/Paddle/pull/70614), [#71012](https://github.com/PaddlePaddle/Paddle/pull/71012), [#71178](https://github.com/PaddlePaddle/Paddle/pull/71178), [#71168](https://github.com/PaddlePaddle/Paddle/pull/71168), [#68740](https://github.com/PaddlePaddle/Paddle/pull/68740), [#71100](https://github.com/PaddlePaddle/Paddle/pull/71100), [#65221](https://github.com/PaddlePaddle/Paddle/pull/65221), [#67983](https://github.com/PaddlePaddle/Paddle/pull/67983)
-
-### Others
+### New Features
-Modifications to related modules such as op test
-[#65654](https://github.com/PaddlePaddle/Paddle/pull/65654), [#66233](https://github.com/PaddlePaddle/Paddle/pull/66233), [#66728](https://github.com/PaddlePaddle/Paddle/pull/66728), [#67959](https://github.com/PaddlePaddle/Paddle/pull/67959), [#68169](https://github.com/PaddlePaddle/Paddle/pull/68169), [#68418](https://github.com/PaddlePaddle/Paddle/pull/68418), [#68434](https://github.com/PaddlePaddle/Paddle/pull/68434), [#68445](https://github.com/PaddlePaddle/Paddle/pull/68445), [#68877](https://github.com/PaddlePaddle/Paddle/pull/68877), [#68993](https://github.com/PaddlePaddle/Paddle/pull/68993), [#69006](https://github.com/PaddlePaddle/Paddle/pull/69006), [#70471](https://github.com/PaddlePaddle/Paddle/pull/70471), [#70706](https://github.com/PaddlePaddle/Paddle/pull/70706), [#67777](https://github.com/PaddlePaddle/Paddle/pull/67777), [#65698](https://github.com/PaddlePaddle/Paddle/pull/65698), [#68433](https://github.com/PaddlePaddle/Paddle/pull/68433), [#65689](https://github.com/PaddlePaddle/Paddle/pull/65689)
+Based on the customdevice integration solution, we introduce a low-cost support solution for hardware backends similar to CUDA. These CUDA-like backends can be plugged into Paddle in a modular manner, allowing for cost-effective reuse of the majority of CUDA kernels from the NVIDIA ecosystem within Paddle. Furthermore, they can be decoupled from feature upgrades within the Paddle framework, significantly reducing the cost of hardware backend integration and iteration, enhancing user willingness to adopt, and fostering a positive collaborative ecosystem between Paddle and hardware manufacturers.
+[#72604](https://github.com/PaddlePaddle/Paddle/pull/72604), [#72668](https://github.com/PaddlePaddle/Paddle/pull/72668), [#72758](https://github.com/PaddlePaddle/Paddle/pull/72758), [#72865](https://github.com/PaddlePaddle/Paddle/pull/72865), [#72910](https://github.com/PaddlePaddle/Paddle/pull/72910), [#73033](https://github.com/PaddlePaddle/Paddle/pull/73033), [#73145](https://github.com/PaddlePaddle/Paddle/pull/73145), [#73281](https://github.com/PaddlePaddle/Paddle/pull/73281), [#73079](https://github.com/PaddlePaddle/Paddle/pull/73079)
-## 9. Environment update
+Enhance XPU fundamental capabilities: add kernels, expand data types, and supplement branches in the XPU environment
+[#71424](https://github.com/PaddlePaddle/Paddle/pull/71424), [#71809](https://github.com/PaddlePaddle/Paddle/pull/71809), [#71594](https://github.com/PaddlePaddle/Paddle/pull/71594), [#71779](https://github.com/PaddlePaddle/Paddle/pull/71779), [#71756](https://github.com/PaddlePaddle/Paddle/pull/71756), [#71573](https://github.com/PaddlePaddle/Paddle/pull/71573), [#71883](https://github.com/PaddlePaddle/Paddle/pull/71883), [#71954](https://github.com/PaddlePaddle/Paddle/pull/71954), [#71931](https://github.com/PaddlePaddle/Paddle/pull/71931), [#72280](https://github.com/PaddlePaddle/Paddle/pull/72280), [#72361](https://github.com/PaddlePaddle/Paddle/pull/72361), [#72406](https://github.com/PaddlePaddle/Paddle/pull/72406), [#72528](https://github.com/PaddlePaddle/Paddle/pull/72528), [#72752](https://github.com/PaddlePaddle/Paddle/pull/72752), [#72852](https://github.com/PaddlePaddle/Paddle/pull/72852), [#72982](https://github.com/PaddlePaddle/Paddle/pull/72982), [#73357](https://github.com/PaddlePaddle/Paddle/pull/73357), [#73414](https://github.com/PaddlePaddle/Paddle/pull/73414), [#73464](https://github.com/PaddlePaddle/Paddle/pull/73464), [#73234](https://github.com/PaddlePaddle/Paddle/pull/73234), [#71776](https://github.com/PaddlePaddle/Paddle/pull/71776)
-- We optimized the framework's stability and cross-platform compatibility, fixed issues related to test coverage and compilation environment compatibility, and enhanced support for multiple platforms such as Windows, XPU, and DCU. Simultaneously, we streamlined the code structure, removed obsolete code and unnecessary dependent libraries to reduce maintenance costs, upgraded key dependencies such as CUDA, further optimized the CI/CD process, improved build speed, and enhanced overall system stability.
+DCU kernel extended data type
+[#73129](https://github.com/PaddlePaddle/Paddle/pull/73129)
### Bug Fixes
-- Improve the CI/CD process, fix test cases, resolve compilation and installation issues in different environments, and enhance the stability and cross-environment compatibility of the framework.
- [#65627](https://github.com/PaddlePaddle/Paddle/pull/65627), [#65736](https://github.com/PaddlePaddle/Paddle/pull/65736), [#65900](https://github.com/PaddlePaddle/Paddle/pull/65900), [#66069](https://github.com/PaddlePaddle/Paddle/pull/66069), [#67000](https://github.com/PaddlePaddle/Paddle/pull/67000), [#67312](https://github.com/PaddlePaddle/Paddle/pull/67312), [#67432](https://github.com/PaddlePaddle/Paddle/pull/67432), [#67540](https://github.com/PaddlePaddle/Paddle/pull/67540), [#67670](https://github.com/PaddlePaddle/Paddle/pull/67670), [#68449](https://github.com/PaddlePaddle/Paddle/pull/68449), [#70806](https://github.com/PaddlePaddle/Paddle/pull/70806), [#65665](https://github.com/PaddlePaddle/Paddle/pull/65665), [#65652](https://github.com/PaddlePaddle/Paddle/pull/65652), [#70644](https://github.com/PaddlePaddle/Paddle/pull/70644), [#68119](https://github.com/PaddlePaddle/Paddle/pull/68119), [#68466](https://github.com/PaddlePaddle/Paddle/pull/68466), [#68858](https://github.com/PaddlePaddle/Paddle/pull/68858), [#68788](https://github.com/PaddlePaddle/Paddle/pull/68788), [#68934](https://github.com/PaddlePaddle/Paddle/pull/68934), [#69883](https://github.com/PaddlePaddle/Paddle/pull/69883), [#69924](https://github.com/PaddlePaddle/Paddle/pull/69924), [#71187](https://github.com/PaddlePaddle/Paddle/pull/71187), [#70798](https://github.com/PaddlePaddle/Paddle/pull/70798), [#71248](https://github.com/PaddlePaddle/Paddle/pull/71248), [#70512](https://github.com/PaddlePaddle/Paddle/pull/70512), [#71363](https://github.com/PaddlePaddle/Paddle/pull/71363), [#71438](https://github.com/PaddlePaddle/Paddle/pull/71438), [#71291](https://github.com/PaddlePaddle/Paddle/pull/71291)
-
-### Improvement and Upgrade
+Fix xpu execution issues
+[#71852](https://github.com/PaddlePaddle/Paddle/pull/71852), [#71966](https://github.com/PaddlePaddle/Paddle/pull/71966), [#72005](https://github.com/PaddlePaddle/Paddle/pull/72005), [#71908](https://github.com/PaddlePaddle/Paddle/pull/71908), [#72431](https://github.com/PaddlePaddle/Paddle/pull/72431), [#72519](https://github.com/PaddlePaddle/Paddle/pull/72519), [#72734](https://github.com/PaddlePaddle/Paddle/pull/72734), [#72763](https://github.com/PaddlePaddle/Paddle/pull/72763), [#72762](https://github.com/PaddlePaddle/Paddle/pull/72762), [#72890](https://github.com/PaddlePaddle/Paddle/pull/72890), [#72867](https://github.com/PaddlePaddle/Paddle/pull/72867), [#73071](https://github.com/PaddlePaddle/Paddle/pull/73071), [#73004](https://github.com/PaddlePaddle/Paddle/pull/73004), [#72726](https://github.com/PaddlePaddle/Paddle/pull/72726), [#73113](https://github.com/PaddlePaddle/Paddle/pull/73113), [#73127](https://github.com/PaddlePaddle/Paddle/pull/73127), [#73025](https://github.com/PaddlePaddle/Paddle/pull/73025), [#73301](https://github.com/PaddlePaddle/Paddle/pull/73301), [#73292](https://github.com/PaddlePaddle/Paddle/pull/73292), [#73272](https://github.com/PaddlePaddle/Paddle/pull/73272), [#73305](https://github.com/PaddlePaddle/Paddle/pull/73305), [#73356](https://github.com/PaddlePaddle/Paddle/pull/73356), [#73438](https://github.com/PaddlePaddle/Paddle/pull/73438), [#72041](https://github.com/PaddlePaddle/Paddle/pull/72041), [#72275](https://github.com/PaddlePaddle/Paddle/pull/72275), [#72787](https://github.com/PaddlePaddle/Paddle/pull/72787), [#73504](https://github.com/PaddlePaddle/Paddle/pull/73504), [#73290](https://github.com/PaddlePaddle/Paddle/pull/73290)
-- Environmental upgrade
- [#69491](https://github.com/PaddlePaddle/Paddle/pull/69491), [#66560](https://github.com/PaddlePaddle/Paddle/pull/66560), [#65686](https://github.com/PaddlePaddle/Paddle/pull/65686), [#71177](https://github.com/PaddlePaddle/Paddle/pull/71177), [#71284](https://github.com/PaddlePaddle/Paddle/pull/71284), [#69791](https://github.com/PaddlePaddle/Paddle/pull/69791), [#69349](https://github.com/PaddlePaddle/Paddle/pull/69349), [#70944](https://github.com/PaddlePaddle/Paddle/pull/70944), [#65411](https://github.com/PaddlePaddle/Paddle/pull/65411)
-- Pipeline merging
- [#66815](https://github.com/PaddlePaddle/Paddle/pull/66815), [#67306](https://github.com/PaddlePaddle/Paddle/pull/67306)
-- Improvement of DCU/NPU/KUNLUN pipeline
- [#67516](https://github.com/PaddlePaddle/Paddle/pull/67516), [#67629](https://github.com/PaddlePaddle/Paddle/pull/67629), [#67987](https://github.com/PaddlePaddle/Paddle/pull/67987), [#69903](https://github.com/PaddlePaddle/Paddle/pull/69903), [#68448](https://github.com/PaddlePaddle/Paddle/pull/68448), [#70401](https://github.com/PaddlePaddle/Paddle/pull/70401), [#71192](https://github.com/PaddlePaddle/Paddle/pull/71192), [#71197](https://github.com/PaddlePaddle/Paddle/pull/71197), [#68027](https://github.com/PaddlePaddle/Paddle/pull/68027)
-- Support for Windows environment
- [#70390](https://github.com/PaddlePaddle/Paddle/pull/70390), [#70785](https://github.com/PaddlePaddle/Paddle/pull/70785), [#71286](https://github.com/PaddlePaddle/Paddle/pull/71286), [#71414](https://github.com/PaddlePaddle/Paddle/pull/71414), [#68901](https://github.com/PaddlePaddle/Paddle/pull/68901)
-- Improvement of third-party libraries
- [#71419](https://github.com/PaddlePaddle/Paddle/pull/71419)
-- Other optimizations are aimed at enhancing CI stability and execution efficiency
- [#67574](https://github.com/PaddlePaddle/Paddle/pull/67574), [#69058](https://github.com/PaddlePaddle/Paddle/pull/69058), [#70610](https://github.com/PaddlePaddle/Paddle/pull/70610), [#67093](https://github.com/PaddlePaddle/Paddle/pull/67093), [#69037](https://github.com/PaddlePaddle/Paddle/pull/69037), [#65213](https://github.com/PaddlePaddle/Paddle/pull/65213), [#65913](https://github.com/PaddlePaddle/Paddle/pull/65913), [#65947](https://github.com/PaddlePaddle/Paddle/pull/65947), [#66479](https://github.com/PaddlePaddle/Paddle/pull/66479), [#71054](https://github.com/PaddlePaddle/Paddle/pull/71054), [#71396](https://github.com/PaddlePaddle/Paddle/pull/71396)
+## 8. Environment Adaptation
-### New Features
-
-- Added Github Action mechanism
- [#70571](https://github.com/PaddlePaddle/Paddle/pull/70571), [#70626](https://github.com/PaddlePaddle/Paddle/pull/70626), [#71325](https://github.com/PaddlePaddle/Paddle/pull/71325), [#71344](https://github.com/PaddlePaddle/Paddle/pull/71344), [#71353](https://github.com/PaddlePaddle/Paddle/pull/71353), [#71322](https://github.com/PaddlePaddle/Paddle/pull/71322), [#70415](https://github.com/PaddlePaddle/Paddle/pull/70415), [#70465](https://github.com/PaddlePaddle/Paddle/pull/70465), [#70524](https://github.com/PaddlePaddle/Paddle/pull/70524), [#70550](https://github.com/PaddlePaddle/Paddle/pull/70550), [#70564](https://github.com/PaddlePaddle/Paddle/pull/70564), [#70579](https://github.com/PaddlePaddle/Paddle/pull/70579), [#70580](https://github.com/PaddlePaddle/Paddle/pull/70580), [#70963](https://github.com/PaddlePaddle/Paddle/pull/70963), [#71200](https://github.com/PaddlePaddle/Paddle/pull/71200), [#71261](https://github.com/PaddlePaddle/Paddle/pull/71261), [#71265](https://github.com/PaddlePaddle/Paddle/pull/71265)
+We have optimized the stability and cross-platform compatibility of the framework, and resolved issues related to compilation and installation failures on various platforms. We have upgraded key dependencies such as CUDA, further optimized the CI/CD process, improved the build speed, and enhanced the overall stability of the system. Additionally, we have ceased maintenance of compilation and installation in the Python 3.8 environment.
-### Discarded
+### Bug fixes
-- Cleanup of obsolete code and dependencies, including removing Python libraries that are no longer needed and simplifying compilation configurations to reduce maintenance costs
- [#65635](https://github.com/PaddlePaddle/Paddle/pull/65635), [#67542](https://github.com/PaddlePaddle/Paddle/pull/67542), [#67609](https://github.com/PaddlePaddle/Paddle/pull/67604), [#69572](https://github.com/PaddlePaddle/Paddle/pull/69572), [#68150](https://github.com/PaddlePaddle/Paddle/pull/68150), [#67604](https://github.com/PaddlePaddle/Paddle/pull/67604), [#68561](https://github.com/PaddlePaddle/Paddle/pull/68561), [#68904](https://github.com/PaddlePaddle/Paddle/pull/68904), [#67219](https://github.com/PaddlePaddle/Paddle/pull/67219)
+- Fixed compilation errors when using clang17 to compile third-party libraries. [#72524](https://github.com/PaddlePaddle/Paddle/pull/72524)
+- Fixed compilation issues when using CUDA 12.9. [#72808](https://github.com/PaddlePaddle/Paddle/pull/72808), [#72841](https://github.com/PaddlePaddle/Paddle/pull/72841), [#72978](https://github.com/PaddlePaddle/Paddle/pull/72978), [#73360](https://github.com/PaddlePaddle/Paddle/pull/73360)
+- Fixed compilation issues when using GCC 13.3. [#73144](https://github.com/PaddlePaddle/Paddle/pull/73144)
+- Fixed compilation issues when WITH_PIP_CUDA_LIBRARIES=ON. [#72907](https://github.com/PaddlePaddle/Paddle/pull/72907)
+- Fixed compilation issues when WITH_NVSHMEM=ON. [#73368](https://github.com/PaddlePaddle/Paddle/pull/73368)
-## 10. other
+### Improvements
-- Changes unrelated to user usage, including cleanup of obsolete code, code migration, cleanup of unit tests, debugging, or upgrades to monitoring mechanisms.
+- Avoid copying temporary files generated during the compilation of custom operators. [#73196](https://github.com/PaddlePaddle/Paddle/pull/73196)
+- Warning message optimization. [#72877](https://github.com/PaddlePaddle/Paddle/pull/72877)
-### Developer-related content
+### Devs
-- Remove useless debugging code and migrate code
- [#65256](https://github.com/PaddlePaddle/Paddle/pull/65256), [#65782](https://github.com/PaddlePaddle/Paddle/pull/65782), [#65836](https://github.com/PaddlePaddle/Paddle/pull/65836), [#65840](https://github.com/PaddlePaddle/Paddle/pull/65840), [#65862](https://github.com/PaddlePaddle/Paddle/pull/65862), [#65863](https://github.com/PaddlePaddle/Paddle/pull/65863), [#65987](https://github.com/PaddlePaddle/Paddle/pull/65987), [#66547](https://github.com/PaddlePaddle/Paddle/pull/66547), [#66556](https://github.com/PaddlePaddle/Paddle/pull/66556), [#66645](https://github.com/PaddlePaddle/Paddle/pull/66645), [#66646](https://github.com/PaddlePaddle/Paddle/pull/66646), [#66648](https://github.com/PaddlePaddle/Paddle/pull/66648), [#66672](https://github.com/PaddlePaddle/Paddle/pull/66672), [#66783](https://github.com/PaddlePaddle/Paddle/pull/66783), [#66083](https://github.com/PaddlePaddle/Paddle/pull/66083), [#65562](https://github.com/PaddlePaddle/Paddle/pull/65562), [#66564](https://github.com/PaddlePaddle/Paddle/pull/66564), [#66370](https://github.com/PaddlePaddle/Paddle/pull/66370), [#66912](https://github.com/PaddlePaddle/Paddle/pull/66912), [#66913](https://github.com/PaddlePaddle/Paddle/pull/66913), [#66914](https://github.com/PaddlePaddle/Paddle/pull/66914), [#66915](https://github.com/PaddlePaddle/Paddle/pull/66915), [#66664](https://github.com/PaddlePaddle/Paddle/pull/66664), [#66671](https://github.com/PaddlePaddle/Paddle/pull/66671), [#66121](https://github.com/PaddlePaddle/Paddle/pull/66121), [#65907](https://github.com/PaddlePaddle/Paddle/pull/65907), [#65949](https://github.com/PaddlePaddle/Paddle/pull/65949), [#65950](https://github.com/PaddlePaddle/Paddle/pull/65950), [#65954](https://github.com/PaddlePaddle/Paddle/pull/65954), [#66545](https://github.com/PaddlePaddle/Paddle/pull/66545), [#66649](https://github.com/PaddlePaddle/Paddle/pull/66649), [#66900](https://github.com/PaddlePaddle/Paddle/pull/66900), [#66901](https://github.com/PaddlePaddle/Paddle/pull/66901), [#66902](https://github.com/PaddlePaddle/Paddle/pull/66902), [#66903](https://github.com/PaddlePaddle/Paddle/pull/66903), [#66904](https://github.com/PaddlePaddle/Paddle/pull/66904), [#66906](https://github.com/PaddlePaddle/Paddle/pull/66906), [#66907](https://github.com/PaddlePaddle/Paddle/pull/66907), [#66908](https://github.com/PaddlePaddle/Paddle/pull/66908), [#66909](https://github.com/PaddlePaddle/Paddle/pull/66909), [#66549](https://github.com/PaddlePaddle/Paddle/pull/66549), [#66555](https://github.com/PaddlePaddle/Paddle/pull/66555), [#66647](https://github.com/PaddlePaddle/Paddle/pull/66647), [#66898](https://github.com/PaddlePaddle/Paddle/pull/66898), [#66886](https://github.com/PaddlePaddle/Paddle/pull/66886), [#66042](https://github.com/PaddlePaddle/Paddle/pull/66042), [#66043](https://github.com/PaddlePaddle/Paddle/pull/66043), [#66045](https://github.com/PaddlePaddle/Paddle/pull/66045), [#66046](https://github.com/PaddlePaddle/Paddle/pull/66046), [#65826](https://github.com/PaddlePaddle/Paddle/pull/65826), [#65825](https://github.com/PaddlePaddle/Paddle/pull/65825), [#65827](https://github.com/PaddlePaddle/Paddle/pull/65827), [#65829](https://github.com/PaddlePaddle/Paddle/pull/65829), [#65830](https://github.com/PaddlePaddle/Paddle/pull/65830), [#65831](https://github.com/PaddlePaddle/Paddle/pull/65831), [#66081](https://github.com/PaddlePaddle/Paddle/pull/66081), [#66082](https://github.com/PaddlePaddle/Paddle/pull/66082), [#66087](https://github.com/PaddlePaddle/Paddle/pull/66087), [#65980](https://github.com/PaddlePaddle/Paddle/pull/65980), [#65981](https://github.com/PaddlePaddle/Paddle/pull/65981), [#65983](https://github.com/PaddlePaddle/Paddle/pull/65983), [#65985](https://github.com/PaddlePaddle/Paddle/pull/65985), [#65979](https://github.com/PaddlePaddle/Paddle/pull/65979), [#65986](https://github.com/PaddlePaddle/Paddle/pull/65986), [#65988](https://github.com/PaddlePaddle/Paddle/pull/65988), [#65989](https://github.com/PaddlePaddle/Paddle/pull/65989), [#66682](https://github.com/PaddlePaddle/Paddle/pull/66682), [#66717](https://github.com/PaddlePaddle/Paddle/pull/66717), [#65802](https://github.com/PaddlePaddle/Paddle/pull/65802), [#66159](https://github.com/PaddlePaddle/Paddle/pull/66159), [#66147](https://github.com/PaddlePaddle/Paddle/pull/66147), [#66149](https://github.com/PaddlePaddle/Paddle/pull/66149), [#66150](https://github.com/PaddlePaddle/Paddle/pull/66150), [#65798](https://github.com/PaddlePaddle/Paddle/pull/65798), [#65731](https://github.com/PaddlePaddle/Paddle/pull/65731), [#66145](https://github.com/PaddlePaddle/Paddle/pull/66145), [#66086](https://github.com/PaddlePaddle/Paddle/pull/66086), [#65781](https://github.com/PaddlePaddle/Paddle/pull/65781), [#65837](https://github.com/PaddlePaddle/Paddle/pull/65837), [#65828](https://github.com/PaddlePaddle/Paddle/pull/65828), [#65864](https://github.com/PaddlePaddle/Paddle/pull/65864), [#65959](https://github.com/PaddlePaddle/Paddle/pull/65959), [#65706](https://github.com/PaddlePaddle/Paddle/pull/65706), [#66918](https://github.com/PaddlePaddle/Paddle/pull/66918), [#66191](https://github.com/PaddlePaddle/Paddle/pull/66191), [#66689](https://github.com/PaddlePaddle/Paddle/pull/66689), [#66808](https://github.com/PaddlePaddle/Paddle/pull/66808), [#65424](https://github.com/PaddlePaddle/Paddle/pull/65424), [#65452](https://github.com/PaddlePaddle/Paddle/pull/65452), [#65463](https://github.com/PaddlePaddle/Paddle/pull/65463), [#65478](https://github.com/PaddlePaddle/Paddle/pull/65478), [#65339](https://github.com/PaddlePaddle/Paddle/pull/65339)
-- Standardize code namespaces
- [#64755](https://github.com/PaddlePaddle/Paddle/pull/64755), [#64765](https://github.com/PaddlePaddle/Paddle/pull/64765), [#64767](https://github.com/PaddlePaddle/Paddle/pull/64767), [#64770](https://github.com/PaddlePaddle/Paddle/pull/64770), [#64775](https://github.com/PaddlePaddle/Paddle/pull/64775), [#64776](https://github.com/PaddlePaddle/Paddle/pull/64776), [#64757](https://github.com/PaddlePaddle/Paddle/pull/64757), [#64780](https://github.com/PaddlePaddle/Paddle/pull/64780), [#64777](https://github.com/PaddlePaddle/Paddle/pull/64777), [#64779](https://github.com/PaddlePaddle/Paddle/pull/64779), [#64758](https://github.com/PaddlePaddle/Paddle/pull/64758), [#64759](https://github.com/PaddlePaddle/Paddle/pull/64759), [#64762](https://github.com/PaddlePaddle/Paddle/pull/64762)
-- Modify operator list
- [#66573](https://github.com/PaddlePaddle/Paddle/pull/66573), [#65598](https://github.com/PaddlePaddle/Paddle/pull/65598), [#65100](https://github.com/PaddlePaddle/Paddle/pull/65100), [#65385](https://github.com/PaddlePaddle/Paddle/pull/65385), [#65192](https://github.com/PaddlePaddle/Paddle/pull/65192), [#65118](https://github.com/PaddlePaddle/Paddle/pull/65118), [#65108](https://github.com/PaddlePaddle/Paddle/pull/65108), [#65153](https://github.com/PaddlePaddle/Paddle/pull/65153), [#65465](https://github.com/PaddlePaddle/Paddle/pull/65465), [#65128](https://github.com/PaddlePaddle/Paddle/pull/65128), [#65420](https://github.com/PaddlePaddle/Paddle/pull/65420), [#65099](https://github.com/PaddlePaddle/Paddle/pull/65099), [#65207](https://github.com/PaddlePaddle/Paddle/pull/65207), [#66066](https://github.com/PaddlePaddle/Paddle/pull/66066), [#65400](https://github.com/PaddlePaddle/Paddle/pull/65400), [#65160](https://github.com/PaddlePaddle/Paddle/pull/65160), [#65195](https://github.com/PaddlePaddle/Paddle/pull/65195), [#65445](https://github.com/PaddlePaddle/Paddle/pull/65445), [#65479](https://github.com/PaddlePaddle/Paddle/pull/65479), [#65193](https://github.com/PaddlePaddle/Paddle/pull/65193), [#65401](https://github.com/PaddlePaddle/Paddle/pull/65401), [#66724](https://github.com/PaddlePaddle/Paddle/pull/66724), [#65164](https://github.com/PaddlePaddle/Paddle/pull/65164), [#65466](https://github.com/PaddlePaddle/Paddle/pull/65466), [#65661](https://github.com/PaddlePaddle/Paddle/pull/65661), [#65897](https://github.com/PaddlePaddle/Paddle/pull/65897), [#66022](https://github.com/PaddlePaddle/Paddle/pull/66022), [#65313](https://github.com/PaddlePaddle/Paddle/pull/65313), [#65616](https://github.com/PaddlePaddle/Paddle/pull/65616), [#65588](https://github.com/PaddlePaddle/Paddle/pull/65588), [#65174](https://github.com/PaddlePaddle/Paddle/pull/65174), [#65402](https://github.com/PaddlePaddle/Paddle/pull/65402), [#65154](https://github.com/PaddlePaddle/Paddle/pull/65154), [#65151](https://github.com/PaddlePaddle/Paddle/pull/65151), [#65098](https://github.com/PaddlePaddle/Paddle/pull/65098), [#64953](https://github.com/PaddlePaddle/Paddle/pull/64953), [#65122](https://github.com/PaddlePaddle/Paddle/pull/65122), [#65590](https://github.com/PaddlePaddle/Paddle/pull/65590), [#65152](https://github.com/PaddlePaddle/Paddle/pull/65152)
-- The old executor function of the Paddle framework is being phased out
- [#65077](https://github.com/PaddlePaddle/Paddle/pull/65077), [#65340](https://github.com/PaddlePaddle/Paddle/pull/65340)
-- Error message prompt optimization
- [#66668](https://github.com/PaddlePaddle/Paddle/pull/66668), [#66675](https://github.com/PaddlePaddle/Paddle/pull/66675), [#66605](https://github.com/PaddlePaddle/Paddle/pull/66605), [#66613](https://github.com/PaddlePaddle/Paddle/pull/66613), [#66507](https://github.com/PaddlePaddle/Paddle/pull/66507), [#66700](https://github.com/PaddlePaddle/Paddle/pull/66700), [#66739](https://github.com/PaddlePaddle/Paddle/pull/66739), [#66719](https://github.com/PaddlePaddle/Paddle/pull/66719), [#66733](https://github.com/PaddlePaddle/Paddle/pull/66733), [#66552](https://github.com/PaddlePaddle/Paddle/pull/66552), [#66548](https://github.com/PaddlePaddle/Paddle/pull/66548), [#66623](https://github.com/PaddlePaddle/Paddle/pull/66623), [#66702](https://github.com/PaddlePaddle/Paddle/pull/66702), [#66705](https://github.com/PaddlePaddle/Paddle/pull/66705), [#66718](https://github.com/PaddlePaddle/Paddle/pull/66718), [#66727](https://github.com/PaddlePaddle/Paddle/pull/66727), [#66860](https://github.com/PaddlePaddle/Paddle/pull/66860), [#66869](https://github.com/PaddlePaddle/Paddle/pull/66869), [#66933](https://github.com/PaddlePaddle/Paddle/pull/66933), [#66939](https://github.com/PaddlePaddle/Paddle/pull/66939), [#66553](https://github.com/PaddlePaddle/Paddle/pull/66553), [#66774](https://github.com/PaddlePaddle/Paddle/pull/66774), [#66794](https://github.com/PaddlePaddle/Paddle/pull/66794), [#66551](https://github.com/PaddlePaddle/Paddle/pull/66551), [#66540](https://github.com/PaddlePaddle/Paddle/pull/66540), [#66617](https://github.com/PaddlePaddle/Paddle/pull/66617), [#66841](https://github.com/PaddlePaddle/Paddle/pull/66841), [#66788](https://github.com/PaddlePaddle/Paddle/pull/66788), [#66954](https://github.com/PaddlePaddle/Paddle/pull/66954), [#66698](https://github.com/PaddlePaddle/Paddle/pull/66698), [#66782](https://github.com/PaddlePaddle/Paddle/pull/66782), [#66844](https://github.com/PaddlePaddle/Paddle/pull/66844), [#66443](https://github.com/PaddlePaddle/Paddle/pull/66443), [#66455](https://github.com/PaddlePaddle/Paddle/pull/66455), [#66517](https://github.com/PaddlePaddle/Paddle/pull/66517), [#66804](https://github.com/PaddlePaddle/Paddle/pull/66804), [#66802](https://github.com/PaddlePaddle/Paddle/pull/66802), [#66536](https://github.com/PaddlePaddle/Paddle/pull/66536), [#66707](https://github.com/PaddlePaddle/Paddle/pull/66707), [#66525](https://github.com/PaddlePaddle/Paddle/pull/66525), [#66753](https://github.com/PaddlePaddle/Paddle/pull/66753), [#66550](https://github.com/PaddlePaddle/Paddle/pull/66550), [#66857](https://github.com/PaddlePaddle/Paddle/pull/66857), [#66471](https://github.com/PaddlePaddle/Paddle/pull/66471), [#66628](https://github.com/PaddlePaddle/Paddle/pull/66628), [#66469](https://github.com/PaddlePaddle/Paddle/pull/66469), [#66775](https://github.com/PaddlePaddle/Paddle/pull/66775), [#66506](https://github.com/PaddlePaddle/Paddle/pull/66506), [#66780](https://github.com/PaddlePaddle/Paddle/pull/66780), [#66953](https://github.com/PaddlePaddle/Paddle/pull/66953), [#66695](https://github.com/PaddlePaddle/Paddle/pull/66695), [#66603](https://github.com/PaddlePaddle/Paddle/pull/66603), [#66491](https://github.com/PaddlePaddle/Paddle/pull/66491), [#66715](https://github.com/PaddlePaddle/Paddle/pull/66715), [#66632](https://github.com/PaddlePaddle/Paddle/pull/66632), [#66594](https://github.com/PaddlePaddle/Paddle/pull/66594), [#66615](https://github.com/PaddlePaddle/Paddle/pull/66615), [#66578](https://github.com/PaddlePaddle/Paddle/pull/66578), [#66534](https://github.com/PaddlePaddle/Paddle/pull/66534), [#66569](https://github.com/PaddlePaddle/Paddle/pull/66569), [#66529](https://github.com/PaddlePaddle/Paddle/pull/66529), [#66530](https://github.com/PaddlePaddle/Paddle/pull/66530), [#66522](https://github.com/PaddlePaddle/Paddle/pull/66522), [#66789](https://github.com/PaddlePaddle/Paddle/pull/66789), [#66600](https://github.com/PaddlePaddle/Paddle/pull/66600), [#66511](https://github.com/PaddlePaddle/Paddle/pull/66511), [#66512](https://github.com/PaddlePaddle/Paddle/pull/66512), [#66527](https://github.com/PaddlePaddle/Paddle/pull/66527), [#66518](https://github.com/PaddlePaddle/Paddle/pull/66518), [#66958](https://github.com/PaddlePaddle/Paddle/pull/66958), [#66532](https://github.com/PaddlePaddle/Paddle/pull/66532), [#65258](https://github.com/PaddlePaddle/Paddle/pull/65258), [#66487](https://github.com/PaddlePaddle/Paddle/pull/66487), [#66876](https://github.com/PaddlePaddle/Paddle/pull/66876), [#66832](https://github.com/PaddlePaddle/Paddle/pull/66832), [#66872](https://github.com/PaddlePaddle/Paddle/pull/66872), [#66830](https://github.com/PaddlePaddle/Paddle/pull/66830), [#66708](https://github.com/PaddlePaddle/Paddle/pull/66708), [#66502](https://github.com/PaddlePaddle/Paddle/pull/66502), [#66521](https://github.com/PaddlePaddle/Paddle/pull/66521), [#66592](https://github.com/PaddlePaddle/Paddle/pull/66592)
+- Compilation, installation, maintenance, and upgrade. [#71911](https://github.com/PaddlePaddle/Paddle/pull/71911), [#73005](https://github.com/PaddlePaddle/Paddle/pull/73005)
+- Image maintenance and updates. [#71065](https://github.com/PaddlePaddle/Paddle/pull/71065), [#71821](https://github.com/PaddlePaddle/Paddle/pull/71821)
+- Import, export, and update of symbols for the Windows platform. [#72497](https://github.com/PaddlePaddle/Paddle/pull/72497), [#72498](https://github.com/PaddlePaddle/Paddle/pull/72498), [#72500](https://github.com/PaddlePaddle/Paddle/pull/72500)
+- Windows platform supports CUDA 12.8. [#72433](https://github.com/PaddlePaddle/Paddle/pull/72433)
+- CI maintenance and upgrade. [#72443](https://github.com/PaddlePaddle/Paddle/pull/72443), [#72836](https://github.com/PaddlePaddle/Paddle/pull/72836), [#72563](https://github.com/PaddlePaddle/Paddle/pull/72563), [#72653](https://github.com/PaddlePaddle/Paddle/pull/72653), [#72477](https://github.com/PaddlePaddle/Paddle/pull/72477), [#72778](https://github.com/PaddlePaddle/Paddle/pull/72778), [#72960](https://github.com/PaddlePaddle/Paddle/pull/72960), [#73289](https://github.com/PaddlePaddle/Paddle/pull/73289), [#73422](https://github.com/PaddlePaddle/Paddle/pull/73422), [#73514](https://github.com/PaddlePaddle/Paddle/pull/73514), [#72748](https://github.com/PaddlePaddle/Paddle/pull/72748),
+- Github Action CI construction. [#71738](https://github.com/PaddlePaddle/Paddle/pull/71738), [#70602](https://github.com/PaddlePaddle/Paddle/pull/70602), [#71958](https://github.com/PaddlePaddle/Paddle/pull/71958), [#71959](https://github.com/PaddlePaddle/Paddle/pull/71959), [#71992](https://github.com/PaddlePaddle/Paddle/pull/71992), [#72013](https://github.com/PaddlePaddle/Paddle/pull/72013), [#72153](https://github.com/PaddlePaddle/Paddle/pull/72153), [#72031](https://github.com/PaddlePaddle/Paddle/pull/72031), [#72141](https://github.com/PaddlePaddle/Paddle/pull/72141), [#72104](https://github.com/PaddlePaddle/Paddle/pull/72104), [#72182](https://github.com/PaddlePaddle/Paddle/pull/72182), [#72342](https://github.com/PaddlePaddle/Paddle/pull/72342), [#72352](https://github.com/PaddlePaddle/Paddle/pull/72352), [#72249](https://github.com/PaddlePaddle/Paddle/pull/72249), [#72068](https://github.com/PaddlePaddle/Paddle/pull/72068), [#72441](https://github.com/PaddlePaddle/Paddle/pull/72441), [#72392](https://github.com/PaddlePaddle/Paddle/pull/72392), [#72446](https://github.com/PaddlePaddle/Paddle/pull/72446), [#72435](https://github.com/PaddlePaddle/Paddle/pull/72435), [#72515](https://github.com/PaddlePaddle/Paddle/pull/72515), [#72514](https://github.com/PaddlePaddle/Paddle/pull/72514), [#72396](https://github.com/PaddlePaddle/Paddle/pull/72396), [#72547](https://github.com/PaddlePaddle/Paddle/pull/72547), [#72345](https://github.com/PaddlePaddle/Paddle/pull/72345), [#72236](https://github.com/PaddlePaddle/Paddle/pull/72236), [#72586](https://github.com/PaddlePaddle/Paddle/pull/72586), [#72537](https://github.com/PaddlePaddle/Paddle/pull/72537), [#72609](https://github.com/PaddlePaddle/Paddle/pull/72609), [#72632](https://github.com/PaddlePaddle/Paddle/pull/72632), [#72642](https://github.com/PaddlePaddle/Paddle/pull/72642), [#72673](https://github.com/PaddlePaddle/Paddle/pull/72673), [#72647](https://github.com/PaddlePaddle/Paddle/pull/72647), [#72696](https://github.com/PaddlePaddle/Paddle/pull/72696), [#72771](https://github.com/PaddlePaddle/Paddle/pull/72771), [#72711](https://github.com/PaddlePaddle/Paddle/pull/72711), [#72680](https://github.com/PaddlePaddle/Paddle/pull/72680), [#72774](https://github.com/PaddlePaddle/Paddle/pull/72774), [#72813](https://github.com/PaddlePaddle/Paddle/pull/72813), [#72804](https://github.com/PaddlePaddle/Paddle/pull/72804), [#72903](https://github.com/PaddlePaddle/Paddle/pull/72903), [#72900](https://github.com/PaddlePaddle/Paddle/pull/72900), [#72932](https://github.com/PaddlePaddle/Paddle/pull/72932), [#72967](https://github.com/PaddlePaddle/Paddle/pull/72967), [#72991](https://github.com/PaddlePaddle/Paddle/pull/72991), [#72115](https://github.com/PaddlePaddle/Paddle/pull/72115), [#73242](https://github.com/PaddlePaddle/Paddle/pull/73242), [#72801](https://github.com/PaddlePaddle/Paddle/pull/72801), [#73433](https://github.com/PaddlePaddle/Paddle/pull/73433), [#73391](https://github.com/PaddlePaddle/Paddle/pull/73391), [#73456](https://github.com/PaddlePaddle/Paddle/pull/73456), [#73376](https://github.com/PaddlePaddle/Paddle/pull/73376), [#73453](https://github.com/PaddlePaddle/Paddle/pull/73453), [#73481](https://github.com/PaddlePaddle/Paddle/pull/73481), [#73546](https://github.com/PaddlePaddle/Paddle/pull/73546), [#73446](https://github.com/PaddlePaddle/Paddle/pull/73446), [#72744](https://github.com/PaddlePaddle/Paddle/pull/72744)
-### Discarded
+### Deprecations
-- Clean up abandoned code and useless unit tests
- [#65894](https://github.com/PaddlePaddle/Paddle/pull/65894), [#66165](https://github.com/PaddlePaddle/Paddle/pull/66165), [#66293](https://github.com/PaddlePaddle/Paddle/pull/66293), [#66102](https://github.com/PaddlePaddle/Paddle/pull/66102), [#66442](https://github.com/PaddlePaddle/Paddle/pull/66442), [#66922](https://github.com/PaddlePaddle/Paddle/pull/66922), [#66531](https://github.com/PaddlePaddle/Paddle/pull/66531), [#65518](https://github.com/PaddlePaddle/Paddle/pull/65518), [#66800](https://github.com/PaddlePaddle/Paddle/pull/66800), [#66372](https://github.com/PaddlePaddle/Paddle/pull/66372), [#65902](https://github.com/PaddlePaddle/Paddle/pull/65902), [#65462](https://github.com/PaddlePaddle/Paddle/pull/65462), [#65327](https://github.com/PaddlePaddle/Paddle/pull/65327), [#65189](https://github.com/PaddlePaddle/Paddle/pull/65189), [#65181](https://github.com/PaddlePaddle/Paddle/pull/65181), [#66535](https://github.com/PaddlePaddle/Paddle/pull/66535), [#65383](https://github.com/PaddlePaddle/Paddle/pull/65383), [#65173](https://github.com/PaddlePaddle/Paddle/pull/65173), [#66429](https://github.com/PaddlePaddle/Paddle/pull/66429), [#66386](https://github.com/PaddlePaddle/Paddle/pull/66386), [#66447](https://github.com/PaddlePaddle/Paddle/pull/66447), [#66367](https://github.com/PaddlePaddle/Paddle/pull/66367), [#66160](https://github.com/PaddlePaddle/Paddle/pull/66160), [#65408](https://github.com/PaddlePaddle/Paddle/pull/65408), [#65433](https://github.com/PaddlePaddle/Paddle/pull/65433), [#65481](https://github.com/PaddlePaddle/Paddle/pull/65481), [#65444](https://github.com/PaddlePaddle/Paddle/pull/65444), [#65389](https://github.com/PaddlePaddle/Paddle/pull/65389), [#65663](https://github.com/PaddlePaddle/Paddle/pull/65663), [#65649](https://github.com/PaddlePaddle/Paddle/pull/65649), [#65629](https://github.com/PaddlePaddle/Paddle/pull/65629), [#66142](https://github.com/PaddlePaddle/Paddle/pull/66142), [#65796](https://github.com/PaddlePaddle/Paddle/pull/65796), [#66163](https://github.com/PaddlePaddle/Paddle/pull/66163), [#66291](https://github.com/PaddlePaddle/Paddle/pull/66291), [#65480](https://github.com/PaddlePaddle/Paddle/pull/65480), [#65495](https://github.com/PaddlePaddle/Paddle/pull/65495), [#65498](https://github.com/PaddlePaddle/Paddle/pull/65498), [#65503](https://github.com/PaddlePaddle/Paddle/pull/65503), [#65502](https://github.com/PaddlePaddle/Paddle/pull/65502), [#65501](https://github.com/PaddlePaddle/Paddle/pull/65501), [#65512](https://github.com/PaddlePaddle/Paddle/pull/65512), [#65528](https://github.com/PaddlePaddle/Paddle/pull/65528), [#65472](https://github.com/PaddlePaddle/Paddle/pull/65472), [#65390](https://github.com/PaddlePaddle/Paddle/pull/65390), [#65344](https://github.com/PaddlePaddle/Paddle/pull/65344), [#65384](https://github.com/PaddlePaddle/Paddle/pull/65384), [#65388](https://github.com/PaddlePaddle/Paddle/pull/65388), [#65198](https://github.com/PaddlePaddle/Paddle/pull/65198), [#65248](https://github.com/PaddlePaddle/Paddle/pull/65248), [#65443](https://github.com/PaddlePaddle/Paddle/pull/65443), [#65430](https://github.com/PaddlePaddle/Paddle/pull/65430)
+- Discontinue support for compilation in Python 3.8 environment. [#72827](https://github.com/PaddlePaddle/Paddle/pull/72827)
-## 11. List of contributors
+## 9. List of contributors
-0x3878f, 0x45f, 2742195759, 86kkd, A-nnonymous, ADream-ki, Aganlengzi, Albresky, AndPuQing, AndSonder, Aoraki-Dream, ApricityXX, Asthestarsfalll, Aurelius84, BHmingyang, BeingGod, Betelgeu, BiynXu, CJ77Qi, Caogration, DDDivano, Dale1314, Deleter-D, DesmonDay, Difers, Dmovic, DongBaiYue, DrRyanHuang, DrownFish19, Eddie-Wang1120, EgoistSA, FeixLiu, ForFishes, Fripping, From00, Function-Samuel, GoldenStain, Guanhuachen2003, GuoxiaWang, Hanyonggong, HarperCy, Hongqing-work, HydrogenSulfate, JZ-LIANG, Jeff114514, JiaWenxuan, LLee233, LanCole, Lans1ot, Layssy, Leoforever123, LiYuRio, LielinJiang, LittleHeroZZZX, Liujie0926, Liyulingyue, Luohongzhige, Marcusryz, MarisaSparkL, Micalling, MikhayEeer, MrXnneHang, MufanColin, NKNaN, Neo-WY, NeroLoh, PolaKuma, Qin-sx, QingshuChen, RachelXu7, RichardWooSJTU, RuohengMa, SCUcookie, Sekiro-x, SigureMo, Sunny-bot1, SylarTiaNII, Sylence8, TBD1, TR666, TimeYWL, Tom-Zheng, Turingg, Victor-Bayim, Vvsmile, WAYKEN-TSE, Wanglongzhi2001, Wangzheee, Waynezee, Wennie396, Whsjrczr, Wizard-ZP, Wong4j, XavierZXY, XiaociZhang, XieYunshen, Xing-lil, Xreki, YKTian-x2b, YZW-explorer, YanhuiDua, YuanRisheng, ZHOU05030, ZhangHandi, ZhangX-21, ZibinGuo, a2064968462, anderson101866, aooxin, aquagull, baoqiwen, bapijun, blacksheep-Aristotle, bukejiyu, carryyu, ccsuzzh, chang-wenbin, changeyoung98, chen2016013, ckl117, cmcamdy, co63oc, continue-coding, cqulilujia, crazyxiaoxi, cszdrg, cubehan3, cyber-pioneer, danleifeng, decade-afk, deepllz, dynamicheart, eee4017, eggman-1024, enkilee, epiphanyer, ethan-sem, fangfangssj, feixi21, fightfat, fufu0615, fxfxfxfxfxfxfxfx, fxy1699, gitliuyf, gongel, gongshaotian, gongweibao, gouzil, gsq7474741, guixxiic, gzy19990617, hanyang2508, haoyu2022, heavyrain-lzy, houj04, huangjiyi, huangkr03, hxzd5568, icpcccpc, inaomIIsfarell, iosmers, jeff41404, jerrywgz, jiachengdai, jiahy0825, jinmingyi1998, jinyouzhi, joseflv, jychen21, jzhang533, kangguangli, kanze1, kineast, kircle888, l1cacheDell, leo0519, lifulll, linkk08, little1d, liufengwei0103, liuruyan, lixcli, liym27, liyongchao911, lizexu123, lizhenyun01, lj970926, lshpku, lszxb, ltd0924, luotao1, lwkhahaha, lxd-cumt, mayang002, megemini, mikemikimike, ming1753, monster1015, mori0umi, ndyysheep, nizne9, nobodynobody, ooooo-create, penPenf28, phlrain, pkuzyc, qili93, rich04lin, risemeup1, ronny1996, rsmallblue, runzhech, skywalker2012, smile2game, sneaxiy, successfulbarrier, sunzhongkai588, swgu98, tc20042008, tianhaodongbd, tianshuo78520a, tizhou86, tlxd, uanu2002, umiswing, vivienfanghuagood, waliwali777, walkalone20, wanghuancoder, wangna11BD, will-jl944, winffke, winter-wang, wwwuyan, xiaoguoguo626807, xiaoluomi, xiaoyao0115, xingmingyyj, xkkkkkk23, xu8117, xuxinyi389, xz-alex, yangrongxinuser, yeteye, yinfan98, yongqiangma, yuan20041218, yuanlehome, yuguo-Jack, yumin066, zbt78, zeroRains, zhangbo9674, zhanghonggeng, zhanglirong1999, zhangting2020, zhangyk0314, zhangyuqin1998, zhiminzhang0830, zhink, zhiqiu, zhouquan32, zhoutianzi666, zhwesky2010, zoooo0820, zrr1999, zty-king, zxcd, zyfncg
+0x3878f, A-nnonymous, AndSonder, ApricityXX, aquagull, author, baoqiwen, BeingGod, blacksheep-Aristotle, BoShen5, bukejiyu, cangtianhuang, carryyu, chang-wenbin, changeyoung98, chen2016013, ckl117, co63oc, cqulilujia, crashbussy, cszdrg, Cutelemon6, cyy536, DanielSun11, danleifeng, datutu-L, deepllz, Dmovic, DrRyanHuang, dynamicheart, Eddie-Wang1120, eggman-1024, emmanuel-ferdman, Enigmatisms, enkilee, fangfangssj, feixi21, FeixLiu, ForFishes, Function-Samuel, ggggxm, GITD245, Glencsa, GoldenStain, gongshaotian, gouzil, gzy19990617, hanlintang, Hongqing-work, houj04, huangjiyi, hxzd5568, HydrogenSulfate, jzhang533, LCStayingdullCircuit, leon062112, lifulll, linkk08, LittleHeroZZZX, liufengwei0103, Liujie0926, liuruyan, lixinqi, LiYuRio, lizexu123, lizhenyun01, lj970926, lshpku, megemini, mikethegoblin, ming1753, mzj104, NKNaN, ooooo-create, pesionzhao, phlrain, pkuzyc, PolaKuma, Qin-sx, RichardWooSJTU, risemeup1, runzhech, RuohengMa, sasaya123, shanjiang7, SigureMo, sneaxiy, swgu98, SylarTiaNII, tianhaodongbd, tianshuo78520a, timminator, tizhou86, umiswing, waliwali777, wanghuancoder, Waynezee, Wennie396, xiaoguoguo626807, XieYunshen, Xing-lil, xkkkkkk23, Xreki, xuxinyi389, Yeenyeong, yongqiangma, YqGe585, yuanlehome, YuanRisheng, yulangz, yuwu46, zeroRains, zhangbo9674, zhanghonggeng, zhangting2020, ZhangX-21, zhangyk0314, zhangyuqin1998, zhink, zhiqiu, zhouquan32, zhoutianzi666, zhupengyang, zrr1999, zty-king, zyfncg