From d9b8a55f3cf2f6bf37d5e99ff232f726401d2e30 Mon Sep 17 00:00:00 2001
From: xxiu1 <102810673+xxiu1@users.noreply.github.com>
Date: Tue, 28 Oct 2025 13:01:17 +0800
Subject: [PATCH 1/3] [CodeStyle][Typos][C-[10-15]] Fix
 typo(`commmit`,`complie`,`condtional`,`conjuction`,`contruct`,`contructed`)
 (#7586)
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

* [Docathon][Fix Doc Format No.22] fix doc ParallelEnv_cn

* [Docathon][Fix Doc Format No.23、24] fix doc parallelize_cn.rst、Strategy_cn.rst

* Revert: restore parallelize_cn.rst to upstream/develop version

* fix typos and update documentation

---------

Co-authored-by: Echo-Nie <157974576+Echo-Nie@users.noreply.github.com>
---
 _typos.toml                                                 | 6 ------
 docs/design/concurrent/go_op.md                             | 2 +-
 docs/design/memory/memory_optimization.md                   | 2 +-
 docs/design/others/gan_api.md                               | 2 +-
 docs/dev_guides/git_guides/local_dev_guide_cn.md            | 2 +-
 .../{complie_and_test_cn.md => compile_and_test_cn.md}      | 0
 docs/dev_guides/sugon/index_cn.rst                          | 4 ++--
 docs/guides/advanced/layer_and_model_en.md                  | 2 +-
 8 files changed, 7 insertions(+), 13 deletions(-)
 rename docs/dev_guides/sugon/{complie_and_test_cn.md => compile_and_test_cn.md} (100%)

diff --git a/_typos.toml b/_typos.toml
index 264515280ef..7fcad764a04 100644
--- a/_typos.toml
+++ b/_typos.toml
@@ -60,13 +60,7 @@ cantains = "cantains"
 classfy = "classfy"
 cliping = "cliping"
 colunms = "colunms"
-commmit = "commmit"
-complie = "complie"
-condtional = "condtional"
-conjuction = "conjuction"
 containg = "containg"
-contruct = "contruct"
-contructed = "contructed"
 contruction = "contruction"
 contxt = "contxt"
 convertion = "convertion"
diff --git a/docs/design/concurrent/go_op.md b/docs/design/concurrent/go_op.md
index 286db8f86e3..1248829f560 100644
--- a/docs/design/concurrent/go_op.md
+++ b/docs/design/concurrent/go_op.md
@@ -3,7 +3,7 @@
 ## Introduction
 
 The **go_op** allows user's of PaddlePaddle to run program blocks on a detached
-thread.  It works in conjuction with CSP operators (channel_send,
+thread.  It works in conjunction with CSP operators (channel_send,
 channel_receive, channel_open, channel_close, and select) to allow users to
 concurrently process data and communicate easily between different threads.
 
diff --git a/docs/design/memory/memory_optimization.md b/docs/design/memory/memory_optimization.md
index 8a3e3a4ae50..fee0f493b6f 100644
--- a/docs/design/memory/memory_optimization.md
+++ b/docs/design/memory/memory_optimization.md
@@ -99,7 +99,7 @@ At last, we take basic strategy and liveness analysis techniques learning from c
 In-place is a built-in attribute of an operator. Since we treat in-place and other operators differently, we have to add an in-place attribute for every operator.
 
 
-#### contruct control flow graph
+#### construct control flow graph
 
 Following is the ProgramDesc protobuf of [machine translation](https://github.com/PaddlePaddle/Paddle/blob/develop/python/paddle/fluid/tests/book/test_machine_translation.py) example.
 
diff --git a/docs/design/others/gan_api.md b/docs/design/others/gan_api.md
index f46b9634d71..4aabf6ebe1b 100644
--- a/docs/design/others/gan_api.md
+++ b/docs/design/others/gan_api.md
@@ -42,7 +42,7 @@ build the whole GAN model, define training loss for both generator and discrimat
 To be more detailed, we introduce our design of DCGAN as following:
 
 ### Class member Function: Initializer
-- Set up hyper-parameters, including condtional dimension, noise dimension, batch size and so forth.
+- Set up hyper-parameters, including conditional dimension, noise dimension, batch size and so forth.
 - Declare and define all the model variables. All the discriminator parameters are included in the list self.theta_D and all the generator parameters are included in the list self.theta_G.
 ```python
 class DCGAN:
diff --git a/docs/dev_guides/git_guides/local_dev_guide_cn.md b/docs/dev_guides/git_guides/local_dev_guide_cn.md
index 5a32cbba40a..747fb59421f 100644
--- a/docs/dev_guides/git_guides/local_dev_guide_cn.md
+++ b/docs/dev_guides/git_guides/local_dev_guide_cn.md
@@ -116,7 +116,7 @@ clang-format.......................................(no files to check)Skipped
  create mode 100644 233
 ```
 
-可以看到，在执行`git commit`后，输出了一些额外的信息。这是使用`pre-commmit`进行代码风格检查的结果，关于代码风格检查的使用问题请参考[代码风格检查指南](./codestyle_check_guide_cn.html)。
+可以看到，在执行`git commit`后，输出了一些额外的信息。这是使用`pre-commit`进行代码风格检查的结果，关于代码风格检查的使用问题请参考[代码风格检查指南](./codestyle_check_guide_cn.html)。
 
 ## 保持本地仓库最新
 
diff --git a/docs/dev_guides/sugon/complie_and_test_cn.md b/docs/dev_guides/sugon/compile_and_test_cn.md
similarity index 100%
rename from docs/dev_guides/sugon/complie_and_test_cn.md
rename to docs/dev_guides/sugon/compile_and_test_cn.md
diff --git a/docs/dev_guides/sugon/index_cn.rst b/docs/dev_guides/sugon/index_cn.rst
index 84d57649217..4dc16f73b41 100644
--- a/docs/dev_guides/sugon/index_cn.rst
+++ b/docs/dev_guides/sugon/index_cn.rst
@@ -4,7 +4,7 @@
 
 以下将说明 Paddle 适配曙光相关的开发指南：
 
-- `曙光智算平台-Paddle 源码编译和单测执行 <./complie_and_test_cn.html>`_ : 如何曙光曙光智算平台编译 Paddle 源码编译并执行单测。
+- `曙光智算平台-Paddle 源码编译和单测执行 <./compile_and_test_cn.html>`_ : 如何曙光曙光智算平台编译 Paddle 源码编译并执行单测。
 - `Paddle 适配 C86 加速卡详解 <./paddle_c86_cn.html>`_ : 详解 Paddle 适配 C86 加速卡。
 - `Paddle 框架下 ROCm(HIP)算子单测修复指导 <./paddle_c86_fix_guides_cn.html>`_ : 指导 Paddle 框架下 ROCm(HIP)算子单测修复。
 
@@ -12,6 +12,6 @@
 ..  toctree::
     :hidden:
 
-    complie_and_test_cn.md
+    compile_and_test_cn.md
     paddle_c86_cn.md
     paddle_c86_fix_guides_cn.md
diff --git a/docs/guides/advanced/layer_and_model_en.md b/docs/guides/advanced/layer_and_model_en.md
index 256b286c7ae..35f2214a6e3 100644
--- a/docs/guides/advanced/layer_and_model_en.md
+++ b/docs/guides/advanced/layer_and_model_en.md
@@ -27,7 +27,7 @@ class Model(paddle.nn.Layer):
         return y
 ```
 
-Here we contructed a ``Model`` which inherited from ``paddle.nn.Layer``. This model only holds a single layer of ``paddle.nn.Flatten``, which flattens the input variables **inputs** upon execution.
+Here we constructed a ``Model`` which inherited from ``paddle.nn.Layer``. This model only holds a single layer of ``paddle.nn.Flatten``, which flattens the input variables **inputs** upon execution.
 
 ## Sublayers
 

From c1bccf21ba794e5b9c61f4caf46c06289b5a717f Mon Sep 17 00:00:00 2001
From: Bowen Xiao <2200013174@stu.pku.edu.cn>
Date: Tue, 28 Oct 2025 13:20:37 +0800
Subject: [PATCH 2/3] [CodeStyle][Typos][P-[1-10],P-[13-15]] Fix
 typo("Prallel", "Paremeter", "Pipline", "Porgram", "Propogation", "Protocal",
 "Pyhton", "parammeters", "palce", "poniter", "promot", "propegation",
 "provicded") (#7582)

* [Fix] typo P[1-10], P[13-15]

* [Fix] fix omitted typo in P-8 and remove corresponding lines in _typos.toml
---
 _typos.toml                                        | 14 --------------
 ci_scripts/check_api_parameters.py                 | 14 +++++++-------
 docs/api/paddle/jit/TranslatedLayer_cn.rst         |  2 +-
 docs/design/concurrent/go_op.md                    |  4 ++--
 docs/design/modules/net_op_design.md               |  2 +-
 docs/design/others/graph_survey.md                 |  2 +-
 docs/design/phi/design_en.md                       |  2 +-
 .../api_contributing_guides/auto_parallel_op_cn.md |  2 +-
 docs/dev_guides/custom_device_docs/event_api_en.md |  2 +-
 .../type_annotations_specification_cn.md           |  2 +-
 docs/eval/evaluation_of_docs_system.md             |  6 +++---
 docs/guides/jit/debugging_en.md                    |  2 +-
 .../transformers.GenerationConfig.md               |  4 ++--
 docs/guides/paddle_v3_features/paddle_ir_cn.md     |  2 +-
 14 files changed, 23 insertions(+), 37 deletions(-)

diff --git a/_typos.toml b/_typos.toml
index 7fcad764a04..d3031fb6f45 100644
--- a/_typos.toml
+++ b/_typos.toml
@@ -37,14 +37,6 @@ Moible = "Moible"
 Operaton = "Operaton"
 Optimizaing = "Optimizaing"
 Optimzier = "Optimzier"
-Paremeter = "Paremeter"
-Pipline = "Pipline"
-Porgram = "Porgram"
-Prallel = "Prallel"
-Propegation = "Propegation"
-Propogation = "Propogation"
-Protocal = "Protocal"
-Pyhton = "Pyhton"
 REGISTE = "REGISTE"
 Reivew = "Reivew"
 Reuqest = "Reuqest"
@@ -141,14 +133,8 @@ outpu = "outpu"
 outpus = "outpus"
 overrided = "overrided"
 overwrited = "overwrited"
-palce = "palce"
-parammeters = "parammeters"
-poniter = "poniter"
 porcess = "porcess"
 processer = "processer"
-promot = "promot"
-propegation = "propegation"
-provicded = "provicded"
 recevied = "recevied"
 recomment = "recomment"
 registerd = "registerd"
diff --git a/ci_scripts/check_api_parameters.py b/ci_scripts/check_api_parameters.py
index 2e6d8b18e2e..c6662f86903 100644
--- a/ci_scripts/check_api_parameters.py
+++ b/ci_scripts/check_api_parameters.py
@@ -107,7 +107,7 @@ def _check_params_in_description(rstfilename, paramstr):
                 )
             else:
                 info = f"The number of params in title does not match the params in description: {len(params_in_title)} != {len(items)}."
-            print(f"check failed (parammeters description): {rstfilename}")
+            print(f"check failed (parameters description): {rstfilename}")
         else:
             for i in range(len(items)):
                 pname_in_title = params_in_title[i].split("=")[0].strip()
@@ -120,13 +120,13 @@ def _check_params_in_description(rstfilename, paramstr):
                         flag = False
                         info = f"the following param in title does not match the param in description: {pname_in_title} != {pname_indesc}."
                         print(
-                            f"check failed (parammeters description): {rstfilename}, {pname_in_title} != {pname_indesc}"
+                            f"check failed (parameters description): {rstfilename}, {pname_in_title} != {pname_indesc}"
                         )
                 else:
                     flag = False
                     info = f"param name '{pname_in_title}' not matched in description line{i + 1}, check it please."
                     print(
-                        f"check failed (parammeters description): {rstfilename}, param name not found in {i} paragraph."
+                        f"check failed (parameters description): {rstfilename}, param name not found in {i} paragraph."
                     )
     else:
         if params_in_title:
@@ -148,8 +148,8 @@ def _check_params_in_description_with_fullargspec(rstfilename, funcname):
         params_inspec = funcspec.args
         if len(items) != len(params_inspec):
             flag = False
-            info = f"check_with_fullargspec failed (parammeters description): {rstfilename}"
-            print(f"check failed (parammeters description): {rstfilename}")
+            info = f"check_with_fullargspec failed (parameters description): {rstfilename}"
+            print(f"check failed (parameters description): {rstfilename}")
         else:
             for i in range(len(items)):
                 pname_in_title = params_inspec[i]
@@ -162,13 +162,13 @@ def _check_params_in_description_with_fullargspec(rstfilename, funcname):
                         flag = False
                         info = f"the following param in title does not match the param in description: {pname_in_title} != {pname_indesc}."
                         print(
-                            f"check failed (parammeters description): {rstfilename}, {pname_in_title} != {pname_indesc}"
+                            f"check failed (parameters description): {rstfilename}, {pname_in_title} != {pname_indesc}"
                         )
                 else:
                     flag = False
                     info = f"param name '{pname_in_title}' not matched in description line{i + 1}, check it please."
                     print(
-                        f"check failed (parammeters description): {rstfilename}, param name not found in {i} paragraph."
+                        f"check failed (parameters description): {rstfilename}, param name not found in {i} paragraph."
                     )
     else:
         if funcspec.args:
diff --git a/docs/api/paddle/jit/TranslatedLayer_cn.rst b/docs/api/paddle/jit/TranslatedLayer_cn.rst
index 7a3ee0826eb..369f453195f 100644
--- a/docs/api/paddle/jit/TranslatedLayer_cn.rst
+++ b/docs/api/paddle/jit/TranslatedLayer_cn.rst
@@ -24,7 +24,7 @@ program(method_name='forward'):
 
 **参数**
 
-    - **method_name** (string) - 要获取的 Porgram 对应的方法名。默认值为"forward"。
+    - **method_name** (string) - 要获取的 Program 对应的方法名。默认值为"forward"。
 
 **返回**
 Program
diff --git a/docs/design/concurrent/go_op.md b/docs/design/concurrent/go_op.md
index 1248829f560..95fc8948d7b 100644
--- a/docs/design/concurrent/go_op.md
+++ b/docs/design/concurrent/go_op.md
@@ -27,7 +27,7 @@ The go operator can be accessed by using the fluid.Go() control flow.  This
 will create a new sub block, where the user can add additional operators
 to be ran on the thread.
 
-**Note:** Since back propegation is currently not support in the go_op, users
+**Note:** Since back propagation is currently not support in the go_op, users
 should ensure that operators in the go block does not require gradient
 calculations.
 
@@ -225,7 +225,7 @@ when spawning these threads.  For the first version of CSP, we only support
 OS threads.
 
 
-#### Backward Propegation:
+#### Backward Propagation:
 
 go_op currently does not support backwards propagation.  Please use go_op with
 non training operators.
diff --git a/docs/design/modules/net_op_design.md b/docs/design/modules/net_op_design.md
index 15f44185a4d..6e7d87f86ff 100644
--- a/docs/design/modules/net_op_design.md
+++ b/docs/design/modules/net_op_design.md
@@ -95,7 +95,7 @@ class PlainNet : public Net {
   virtual Error InferShape(Scope *scope) override;
 
   // Run all the operators with the `scope`, if no scope is provided, default
-  // scope will be used instead. If no OpContext is provicded, default context will be used.
+  // scope will be used instead. If no OpContext is provided, default context will be used.
   virtual Error Run(Scope *scope = nullptr, OpContext *context=nullptr, OpIndex begin = -1,
                    OpIndex end = -1) const override;
 
diff --git a/docs/design/others/graph_survey.md b/docs/design/others/graph_survey.md
index b4b824a2893..a3690d1f190 100644
--- a/docs/design/others/graph_survey.md
+++ b/docs/design/others/graph_survey.md
@@ -30,7 +30,7 @@ def get_symbol(num_classes=10, **kwargs):
 
 Variable here is actually a Symbol. Every basic Symbol will correspond to one Node, and every Node has its own AnyAttr. There is a op field in AnyAttr class, when a Symbol represents Variable(often input data), the op field is null.
 
-Symbol contains a data member, std::vector<NodeEntry> outputs, and NodeEntry cantains a poniter to Node. We can follow the Node pointer to get all the Graph.
+Symbol contains a data member, std::vector<NodeEntry> outputs, and NodeEntry cantains a pointer to Node. We can follow the Node pointer to get all the Graph.
 
 And Symbol can be saved to a JSON file.
 
diff --git a/docs/design/phi/design_en.md b/docs/design/phi/design_en.md
index ee214a68e06..5173687327d 100644
--- a/docs/design/phi/design_en.md
+++ b/docs/design/phi/design_en.md
@@ -867,7 +867,7 @@ For the management of the new form of Kernel, the current design is as follows:
 Described as follows:
 
 - `KernelFactory` is a global singleton data structure for managing Kernel. Similar to `OpKernelMap` of fluid, it is a two-level map. The first-level mapping finds the Kernel set according to the name, and the second-level mapping finds the specific Kernel according to the KernelKey.
-- `KernelKey` is similar to the original `OpKernelType`, but the `palce` and `library_type` fields are combined into one and called `Backend`, because the original `LibraryType` is a limited enumeration class, which is strongly related to place, the splitting increases the cost of understanding instead.
+- `KernelKey` is similar to the original `OpKernelType`, but the `place` and `library_type` fields are combined into one and called `Backend`, because the original `LibraryType` is a limited enumeration class, which is strongly related to place, the splitting increases the cost of understanding instead.
 - `Kernel` holds more information than the original `OpKernel`. In addition to the Function during execution, it also holds information about specific parameters, namely `KernelArgsDef`. For Tensor type input and output, it saves Tensor type information, Device, data Type, data layout. For Attribute type input and output, it saves type information.
 
 
diff --git a/docs/dev_guides/api_contributing_guides/auto_parallel_op_cn.md b/docs/dev_guides/api_contributing_guides/auto_parallel_op_cn.md
index 91454acda5c..f0a3d7f34cb 100644
--- a/docs/dev_guides/api_contributing_guides/auto_parallel_op_cn.md
+++ b/docs/dev_guides/api_contributing_guides/auto_parallel_op_cn.md
@@ -107,7 +107,7 @@ SpmdInfo ElementwiseBinaryInferSpmd(const DistMetaTensor& x,
   std::string x_axes, y_axes, out_axes;
   GetBinaryNotations(x_shape, y_shape, &x_axes, &y_axes, &out_axes);
 
-  // Step2: Sharding Propogation
+  // Step2: Sharding Propagation
   // Step2.1: 合并输入的 dims mapping，得到每一维度对应的 dims mapping 值。
   // 调用 ShardingMergeForTensors 可以对输入 dims mapping 进行合并，返回的 map 即为
   // 每一维度对应的 dims mapping 值。
diff --git a/docs/dev_guides/custom_device_docs/event_api_en.md b/docs/dev_guides/custom_device_docs/event_api_en.md
index 887bedb04b2..a41d90ff59b 100644
--- a/docs/dev_guides/custom_device_docs/event_api_en.md
+++ b/docs/dev_guides/custom_device_docs/event_api_en.md
@@ -12,7 +12,7 @@ C_Status (*create_event)(const C_Device device, C_Event* event)
 
 It creates an event, which is used to synchronize tasks of different streams within the framework. When the device does not support asynchronous execution, empty implementation of the API is required.
 
-### Paremeter
+### Parameter
 
 device - the device to be used
 
diff --git a/docs/dev_guides/style_guide_and_references/type_annotations_specification_cn.md b/docs/dev_guides/style_guide_and_references/type_annotations_specification_cn.md
index 6bf88ef1e2f..ed07d7c3da1 100644
--- a/docs/dev_guides/style_guide_and_references/type_annotations_specification_cn.md
+++ b/docs/dev_guides/style_guide_and_references/type_annotations_specification_cn.md
@@ -256,7 +256,7 @@ def filter_user(user: list[User], type: UserType) -> list[User]: ...
 
 ### 参数应尽可能使用抽象类型，返回值应尽可能使用具体类型
 
-对于函数输入参数，如果允许，我们应该尽可能使用 [Protocal](https://docs.python.org/3/library/typing.html#typing.Protocol)，如 [Sequence](https://docs.python.org/3/library/collections.abc.html#collections.abc.Sequence)、[Mapping](https://docs.python.org/3/library/collections.abc.html#collections.abc.Mapping) 、[Iterable](https://docs.python.org/3/library/collections.abc.html#collections.abc.Iterable) 等抽象类型，以提高函数的通用性。而对于函数返回值，我们应该尽可能使用具体类型，以确保下游使用时能得到更好的提示效果。
+对于函数输入参数，如果允许，我们应该尽可能使用 [Protocol](https://docs.python.org/3/library/typing.html#typing.Protocol)，如 [Sequence](https://docs.python.org/3/library/collections.abc.html#collections.abc.Sequence)、[Mapping](https://docs.python.org/3/library/collections.abc.html#collections.abc.Mapping) 、[Iterable](https://docs.python.org/3/library/collections.abc.html#collections.abc.Iterable) 等抽象类型，以提高函数的通用性。而对于函数返回值，我们应该尽可能使用具体类型，以确保下游使用时能得到更好的提示效果。
 
 比如相比于如下写法：
 
diff --git a/docs/eval/evaluation_of_docs_system.md b/docs/eval/evaluation_of_docs_system.md
index 3ca309cdd61..588ec95b9bc 100644
--- a/docs/eval/evaluation_of_docs_system.md
+++ b/docs/eval/evaluation_of_docs_system.md
@@ -261,13 +261,13 @@ TensorFlow 的文档规划，比较直接地匹配了本文所介绍的分类标
     - Single-Machine Model Parallel Best Practices
     - Getting Started with Distributed Data Parallel
     - Writing Distributed Applications with PyTorch
-    - Getting Started with Fully Sharded Data Prallel
+    - Getting Started with Fully Sharded Data Parallel
     - Customize Process Group Backends Using Cpp Extension
     - Getting Started with Distributed RPC Framework
     - Implementing a Parameter Server Using Distributed RPC Framework
     - Distributed Pipeline Parallelsim using RPC
     - Implementing Batch RPC Processing Using Asynchronous Executions
-    - Combining Distributed DataPrallel with Distributed RPC Framework
+    - Combining Distributed DataParallel with Distributed RPC Framework
     - Training Transformer models using Pipeline Parallelism
     - Training Transformer models using Distributed Data Parallel and Pipeline Parallelism
     - Distributed Training with Uneven Inputs Using the Join Context Manager
@@ -562,7 +562,7 @@ MindSpore 的有自己独立的文档分类标准和风格，所以硬套本文
 | 移动端相关                   | 独立的栏目 https://www.tensorflow.org/lite                   | 10+  | Image Segmentation DeepLabV3 on iOS Image Segmentation DeepLabV3 on Android | 2    |                                                              | 0    | Paddle Lite 中独立存在                                       | 未统计 |
 | 框架之间的迁移相关           |                                                              |      |                                                              | 0    | 概述 准备工作 网络脚本分析 网络脚本开发 网络调试 精度调试 性能调试 推理执行 网络迁移调试实例 常见问题 | 10   | Paddle 1.8 与 Paddle 2.0 API 映射表 PyTorch-PaddlePaddle API 映射表 版本迁移工具 | 3      |
 | 自定义算子                   | Tensors and operations Custom layers Custom training: walkthrough Create an op Extension types | 5    | Double Backward with Custom Functions Fusing Convolution and Batch Norm using Custom Function Custom C++ and CUDA Extensions Extending TorchScript with Custom C++ Operators Extending TorchScript with Custom C++ Classes Registering a Dispatched Operator in C++ Extending dispatcher for a new backend in C++ | 7    | 算子分类 运算重载 自定义算子（CPU） 自定义算子（GPU） 自定义算子（Ascend） 自定义算子（基于 Custom 表达） | 6    | 自定义原生算子 原生算子开发注意事项 自定义外部算子 自定义 Python 算子 API 介绍 API 示例 本地开发指南 提交 PR 注意事项 FAQ | 9      |
-| 分布式训练                   | Distributed training with Kereas Distributed training with DTensors Using DTensors with Keras Custom training loops Multi-worker training with Keras Multi-worker training with CTL Parameter Server Training Distributed input Distributed training | 9    | PyTorch Distributed Overview Single-Machine Model Parallel Best PracticesGetting Started with Distributed Data Parallel Writing Distributed Applications with PyTorch Getting Started with Fully Sharded Data Prallel Customize Process Group Backends Using Cpp Extension Getting Started with Distributed RPC Framework Implementing a Parameter Server Using Distributed RPC Framework Distributed Pipeline Parallelsim using RPC Implementing Batch RPC Processing Using Asynchronous Executions Combining Distributed DataPrallel with Distributed RPC Framework Training Transformer models using Pipeline Parallelism Training Transformer models using Distributed Data Parallel and Pipeline Parallelism Distributed Training with Uneven Inputs Using the Join Context Manager | 16   | 分布式并行总览 分布式集合通信原语 分布式并行训练基础样例（Ascend） 分布式并行训练基础样例（GPU） 分布式推理 保存和加载模型（HyBrid Parallel 模式） 分布式并行训练 Transformer 模型 鹏程·盘古模型网络多维度混合并行解析 分布式故障恢复 | 9    | 单机多卡训练 分布式训练开始 使用 FleetAPI 进行分布式训练     | 3      |
+| 分布式训练                   | Distributed training with Kereas Distributed training with DTensors Using DTensors with Keras Custom training loops Multi-worker training with Keras Multi-worker training with CTL Parameter Server Training Distributed input Distributed training | 9    | PyTorch Distributed Overview Single-Machine Model Parallel Best PracticesGetting Started with Distributed Data Parallel Writing Distributed Applications with PyTorch Getting Started with Fully Sharded Data Parallel Customize Process Group Backends Using Cpp Extension Getting Started with Distributed RPC Framework Implementing a Parameter Server Using Distributed RPC Framework Distributed Pipeline Parallelsim using RPC Implementing Batch RPC Processing Using Asynchronous Executions Combining Distributed DataParallel with Distributed RPC Framework Training Transformer models using Pipeline Parallelism Training Transformer models using Distributed Data Parallel and Pipeline Parallelism Distributed Training with Uneven Inputs Using the Join Context Manager | 16   | 分布式并行总览 分布式集合通信原语 分布式并行训练基础样例（Ascend） 分布式并行训练基础样例（GPU） 分布式推理 保存和加载模型（HyBrid Parallel 模式） 分布式并行训练 Transformer 模型 鹏程·盘古模型网络多维度混合并行解析 分布式故障恢复 | 9    | 单机多卡训练 分布式训练开始 使用 FleetAPI 进行分布式训练     | 3      |
 | 框架设计文档                 | Random number generation                                     | 1    | 分散在 API 文档、源码中，其实比较丰富。30+                   | 30+  | 设计白皮书 全场景统一 函数式微分编程 动静态图结合 异构并行训练 分布式并行 中间表达 MindIR 高性能数据处理引擎 图算融合加速引擎 二阶优化 可视化调试调优 安全可信 术语 | 13   |                                                              | 0      |
 | 其它                         | Integrated gradients Uncertainty quantification with SNGP Probabilistic regression Keras 一级标题下的 13 篇文章 Thinking in TensorFlow 2 Data input pipelines 一级标题下的 3 篇 GPU TPU | 20   | Learn the Basics Quickstart Deep Learning with PyTorch: A 60 Minute Blitz Building a Convolution/Batch Norm fuser in FX Building a Simple CPU Performance Profiler with FX Channels Last Memory Format in PyTorch Forward-mode Automatic Differentiation Using the PyTorch C++ Frontend Dynamic Parallelism in TorchScript Autograd in C++ Frontend Static Quantization with Eager Model in PyTorch | 11   | 基本介绍 快速入门 进阶案例：线性拟合 混合精度 梯度累积算法 自适应梯度求和算法 降维训练算法 | 7    | 10 分钟快速上手飞桨 使用线性回归预测波士顿房价 模型导出 ONNX 协议 飞桨产品硬件支持表 昆仑芯 XPU 芯片运行飞桨 海光 DCU 芯片运行飞桨 昇腾 NPU 芯片运行飞桨 环境变量 FLAGS 下 9 篇 hello paddle：从普通程序走向机器学习程序 通过 AutoEncoder 实现时序数据异常检测 广播介绍 自动混合精度训练 梯度裁剪 升级指南 | 20+    |
 
diff --git a/docs/guides/jit/debugging_en.md b/docs/guides/jit/debugging_en.md
index ad75e68af94..55183b54219 100644
--- a/docs/guides/jit/debugging_en.md
+++ b/docs/guides/jit/debugging_en.md
@@ -58,7 +58,7 @@ The C++ error stack is hidden by default. You can set the C++ environment variab
 ## 2、Debugging Method
 Before debugging, **please ensure that the dynamic graph code before conversion can run successfully**. The following introduces several debugging methods recommended in Dynamic-to-Static.
 ### 2.1 Pdb Debugging
-pdb is a module in Python that defines an interactive Pyhton source code debugger. It supports setting breakpoints and single stepping between source lines, listing source code and variables, running Python code, etc.
+pdb is a module in Python that defines an interactive Python source code debugger. It supports setting breakpoints and single stepping between source lines, listing source code and variables, running Python code, etc.
 #### 2.1.1 Debugging steps
 
 - step1: Insert `import pdb; pdb.set_trace()` before the code where you want to enable pdb debugging.
diff --git a/docs/guides/model_convert/convert_from_pytorch/api_difference/torch_more_args/transformers.GenerationConfig.md b/docs/guides/model_convert/convert_from_pytorch/api_difference/torch_more_args/transformers.GenerationConfig.md
index 42aeecd5e97..9e25ac758ff 100644
--- a/docs/guides/model_convert/convert_from_pytorch/api_difference/torch_more_args/transformers.GenerationConfig.md
+++ b/docs/guides/model_convert/convert_from_pytorch/api_difference/torch_more_args/transformers.GenerationConfig.md
@@ -16,9 +16,9 @@ paddlenlp.generation.GenerationConfig(*kwargs)
 | transformers                         | PaddlePaddle        | 备注      |
 | -------------------------------------| ------------------- | -------- |
 | max_length                           | max_length          | 最大生成长度。 |
-| max_new_tokens                       | -                   | 最大生成长度(忽略 promot)，Paddle 无此参数，一般对网络训练结果影响不大，可直接删除。|
+| max_new_tokens                       | -                   | 最大生成长度(忽略 prompt)，Paddle 无此参数，一般对网络训练结果影响不大，可直接删除。|
 | min_length                           | min_length          | 最小生成长度。 |
-| min_new_tokens                       | -                   | 最小生成长度(忽略 promot)，Paddle 无此参数，一般对网络训练结果影响不大，可直接删除。 |
+| min_new_tokens                       | -                   | 最小生成长度(忽略 prompt)，Paddle 无此参数，一般对网络训练结果影响不大，可直接删除。 |
 | early_stopping                       | early_stopping      | 早停是否开启。 |
 | max_time                             | -                   | 最大允许计算运行时间，Paddle 无此参数，一般对网络训练结果影响不大，可直接删除。 |
 | do_sample                            | do_sample           | 是否进行采样。 |
diff --git a/docs/guides/paddle_v3_features/paddle_ir_cn.md b/docs/guides/paddle_v3_features/paddle_ir_cn.md
index 4f4651f6dd3..567399b09fe 100644
--- a/docs/guides/paddle_v3_features/paddle_ir_cn.md
+++ b/docs/guides/paddle_v3_features/paddle_ir_cn.md
@@ -102,7 +102,7 @@ print(out)
 
 ### 2.多层级的 Dialect
 
-飞桨通过不同层级的 Dialect 来管理框架内不同领域的算子体系，比如 Built-in 下的 Shape Dialect 和 Control Flow Dialect，分别用户形状符号推导和控制流表示、与 PHI 算子库执行体系相关的 Operator Dialect 和 Kernel Dialect、与神经网络编译器领域相关的 CINN  Dialect 等。在飞桨神经网络编译器中，主要以计算图 Operator Dialect 为输入，经过组合算子和 Pass Pipline 后，会转换为 CINN Dialect，并附加 Shape Dialect 中的符号信息，最后会 Lowering 成编译器的 AST IR。
+飞桨通过不同层级的 Dialect 来管理框架内不同领域的算子体系，比如 Built-in 下的 Shape Dialect 和 Control Flow Dialect，分别用户形状符号推导和控制流表示、与 PHI 算子库执行体系相关的 Operator Dialect 和 Kernel Dialect、与神经网络编译器领域相关的 CINN  Dialect 等。在飞桨神经网络编译器中，主要以计算图 Operator Dialect 为输入，经过组合算子和 Pass Pipeline 后，会转换为 CINN Dialect，并附加 Shape Dialect 中的符号信息，最后会 Lowering 成编译器的 AST IR。
 上述这些多层级的 Dialect 内的算子 Op 会组成 Program ，并用来表示一个具体的模型。它包含两部分：计算图 和 权重 。
 * Value、Operation 用来对计算图进行抽象。Value 表示计算图中的有向边，他用来将两个 Operaton 关联起来，描述了程序中的 UD 链 ，Operation 表示计算图中的节点。一个 Operation 表示一个算子，它里面包含了零个或多个 Region 。Region 表示一个闭包，它里面包含了零个或多个 Block。Block 表示一个符合 SSA 的基本块，里面包含了零个或多个 Operation 。三者循环嵌套，可以实现任意复杂的语法结构。
 * Weight 用来对模型的权重参数进行单独存储，这也是深度学习框架和传统编译器不一样的地方。传统编译器会将数据段内嵌到程序里面。这是因为传统编译器里面，数据和代码是强绑定的，不可分割。但是对神经网络而言，一个计算图的每个 epoch 都会存在一份权重参数，多个计算图也有可能共同一份权重参数，二者不是强绑定的

From 619c1a8128c5a0b26050c20f2d40e0dc9af51c70 Mon Sep 17 00:00:00 2001
From: Yuqiang Ge <143453447+YqGe585@users.noreply.github.com>
Date: Tue, 28 Oct 2025 20:54:58 +0800
Subject: [PATCH 3/3] [API Compatibility] Add paddle.cuda apis (#7470)

* add apis

* add paddle.device

* fix COPY-FROM

* fix code example

* fix codestyle

* fix py:function

---------

Co-authored-by: Echo-Nie <157974576+Echo-Nie@users.noreply.github.com>
---
 docs/api/paddle/cuda/Overview_cn.rst          |  6 ++++
 docs/api/paddle/cuda/current_device_cn.rst    | 17 ++++++++++
 docs/api/paddle/cuda/device_count_cn.rst      | 17 ++++++++++
 docs/api/paddle/cuda/empty_cache_cn.rst       | 12 +++++++
 docs/api/paddle/cuda/memory_allocated_cn.rst  | 22 +++++++++++++
 docs/api/paddle/cuda/memory_reserved_cn.rst   | 22 +++++++++++++
 docs/api/paddle/cuda/set_device_cn.rst        | 17 ++++++++++
 docs/api/paddle/device/Overview_cn.rst        |  9 ++++++
 docs/api/paddle/device/device_count_cn.rst    | 22 +++++++++++++
 docs/api/paddle/device/empty_cache_cn.rst     | 18 +++++++++++
 .../device/get_device_properties_cn.rst       | 22 +++++++++++++
 .../paddle/device/max_memory_allocated_cn.rst | 28 +++++++++++++++++
 .../paddle/device/max_memory_reserved_cn.rst  | 28 +++++++++++++++++
 .../api/paddle/device/memory_allocated_cn.rst | 31 +++++++++++++++++++
 docs/api/paddle/device/memory_reserved_cn.rst | 28 +++++++++++++++++
 .../device/reset_max_memory_allocated_cn.rst  | 28 +++++++++++++++++
 .../device/reset_max_memory_reserved_cn.rst   | 28 +++++++++++++++++
 17 files changed, 355 insertions(+)
 create mode 100644 docs/api/paddle/cuda/current_device_cn.rst
 create mode 100644 docs/api/paddle/cuda/device_count_cn.rst
 create mode 100644 docs/api/paddle/cuda/empty_cache_cn.rst
 create mode 100644 docs/api/paddle/cuda/memory_allocated_cn.rst
 create mode 100644 docs/api/paddle/cuda/memory_reserved_cn.rst
 create mode 100644 docs/api/paddle/cuda/set_device_cn.rst
 create mode 100644 docs/api/paddle/device/device_count_cn.rst
 create mode 100644 docs/api/paddle/device/empty_cache_cn.rst
 create mode 100644 docs/api/paddle/device/get_device_properties_cn.rst
 create mode 100644 docs/api/paddle/device/max_memory_allocated_cn.rst
 create mode 100644 docs/api/paddle/device/max_memory_reserved_cn.rst
 create mode 100644 docs/api/paddle/device/memory_allocated_cn.rst
 create mode 100644 docs/api/paddle/device/memory_reserved_cn.rst
 create mode 100644 docs/api/paddle/device/reset_max_memory_allocated_cn.rst
 create mode 100644 docs/api/paddle/device/reset_max_memory_reserved_cn.rst

diff --git a/docs/api/paddle/cuda/Overview_cn.rst b/docs/api/paddle/cuda/Overview_cn.rst
index 31214bb21b5..3b6d87dc97f 100644
--- a/docs/api/paddle/cuda/Overview_cn.rst
+++ b/docs/api/paddle/cuda/Overview_cn.rst
@@ -18,3 +18,9 @@ PyTorch 兼容函数
     " :ref:`cudart <cn_api_paddle_cuda_cudart>` ", "以模块的形式返回 CUDA Runtime 对象"
     " :ref:`is_initialized <cn_api_paddle_cuda_is_initialized>` ", "判断 CUDA 是否已经初始化"
     " :ref:`mem_get_info <cn_api_paddle_cuda_mem_get_info>` ", "获取指定设备上的全局空闲显存和显存总量"
+    " :ref:`current_device <cn_api_paddle_cuda_current_device>` ", "返回当前设备的索引"
+    " :ref:`device_count <cn_api_paddle_cuda_device_count>` ", "返回可用的 CUDA 设备数量"
+    " :ref:`empty_cache <cn_api_paddle_cuda_empty_cache>` ", "释放当前设备上所有未占用的缓存内存"
+    " :ref:`memory_allocated <cn_api_paddle_cuda_memory_allocated>` ", "返回当前设备上分配的内存总量"
+    " :ref:`memory_reserved <cn_api_paddle_cuda_memory_reserved>` ", "返回当前设备上由缓存分配器管理的内存总量"
+    " :ref:`set_device <cn_api_paddle_cuda_set_device>` ", "设置当前设备"
diff --git a/docs/api/paddle/cuda/current_device_cn.rst b/docs/api/paddle/cuda/current_device_cn.rst
new file mode 100644
index 00000000000..e819c355981
--- /dev/null
+++ b/docs/api/paddle/cuda/current_device_cn.rst
@@ -0,0 +1,17 @@
+.. _cn_api_paddle_cuda_current_device:
+
+current_device
+--------------
+
+.. py:function:: paddle.cuda.current_device()
+
+返回当前设备的索引。
+
+返回
+::::::::::::
+
+    int, 当前设备的索引。
+
+代码示例
+::::::::::::
+COPY-FROM: paddle.cuda.current_device
diff --git a/docs/api/paddle/cuda/device_count_cn.rst b/docs/api/paddle/cuda/device_count_cn.rst
new file mode 100644
index 00000000000..aafe97fce57
--- /dev/null
+++ b/docs/api/paddle/cuda/device_count_cn.rst
@@ -0,0 +1,17 @@
+.. _cn_api_paddle_cuda_device_count:
+
+device_count
+------------
+
+.. py:function:: paddle.cuda.device_count()
+
+返回可用的计算卡设备数量。
+
+返回
+::::::::::::
+
+    int, 可用的计算卡设备数量。
+
+代码示例
+::::::::::::
+COPY-FROM: paddle.cuda.device_count
diff --git a/docs/api/paddle/cuda/empty_cache_cn.rst b/docs/api/paddle/cuda/empty_cache_cn.rst
new file mode 100644
index 00000000000..a5e84babf71
--- /dev/null
+++ b/docs/api/paddle/cuda/empty_cache_cn.rst
@@ -0,0 +1,12 @@
+.. _cn_api_paddle_cuda_empty_cache:
+
+empty_cache
+-----------
+
+.. py:function:: paddle.cuda.empty_cache()
+
+该函数用于释放显存分配器中空闲的显存，这样其他的 GPU 应用程序就可以使用释放出来的显存，并在 nvidia-smi 中可见。大多数情况下您不需要使用该函数，当您删除 GPU 上的 Tensor 时，Paddle 框架并不会将显存释放，而是将显存保留起来，以便在下一次申明显存时可以更快的完成分配。
+
+代码示例
+::::::::::::
+COPY-FROM: paddle.cuda.empty_cache
diff --git a/docs/api/paddle/cuda/memory_allocated_cn.rst b/docs/api/paddle/cuda/memory_allocated_cn.rst
new file mode 100644
index 00000000000..56547704641
--- /dev/null
+++ b/docs/api/paddle/cuda/memory_allocated_cn.rst
@@ -0,0 +1,22 @@
+.. _cn_api_paddle_cuda_memory_allocated:
+
+memory_allocated
+---------------
+
+.. py:function:: paddle.cuda.memory_allocated(device=None)
+
+返回给定设备上当前分配给 Tensor 的显存大小。
+
+参数
+::::::::::::
+
+    - **device** (DeviceLike) - 指定要查询的设备，可以是 "int" 用来表示设备 id，可以是形如 "gpu:0" 之类的设备描述字符串，也可以是 `paddle.CUDAPlace(0)` 之类的设备实例。如果为 None（默认值）或未指定设备索引，则返回由 ``paddle.device.get_device()`` 给出的当前设备的统计信息。
+
+返回
+::::::::::::
+
+    int, 当前设备上分配的内存总量（字节）。
+
+代码示例
+::::::::::::
+COPY-FROM: paddle.cuda.memory_allocated
diff --git a/docs/api/paddle/cuda/memory_reserved_cn.rst b/docs/api/paddle/cuda/memory_reserved_cn.rst
new file mode 100644
index 00000000000..1fbb7da12bf
--- /dev/null
+++ b/docs/api/paddle/cuda/memory_reserved_cn.rst
@@ -0,0 +1,22 @@
+.. _cn_api_paddle_cuda_memory_reserved:
+
+memory_reserved
+---------------
+
+.. py:function:: paddle.cuda.memory_reserved(device=None)
+
+返回当前设备上由缓存分配器管理的内存总量。
+
+参数
+::::::::::::
+
+    - **device** (DeviceLike) - 指定要查询的设备，可以是 "int" 用来表示设备 id，可以是形如 "gpu:0" 之类的设备描述字符串，也可以是 `paddle.CUDAPlace(0)` 之类的设备实例。如果为 None（默认值）或未指定设备索引，则返回由 ``paddle.device.get_device()`` 给出的当前设备的统计信息。
+
+返回
+::::::::::::
+
+    int, 当前设备上由缓存分配器管理的内存总量（字节）。
+
+代码示例
+::::::::::::
+COPY-FROM: paddle.cuda.memory_reserved
diff --git a/docs/api/paddle/cuda/set_device_cn.rst b/docs/api/paddle/cuda/set_device_cn.rst
new file mode 100644
index 00000000000..b0cf1a0caca
--- /dev/null
+++ b/docs/api/paddle/cuda/set_device_cn.rst
@@ -0,0 +1,17 @@
+.. _cn_api_paddle_cuda_set_device:
+
+set_device
+----------
+
+.. py:function:: paddle.cuda.set_device(device)
+
+设置当前设备。
+
+参数
+::::::::::::
+
+    - **device** (DeviceLike) - 要设置的设备，可以是 "int" 用来表示设备 id，可以是形如 "gpu:0" 之类的设备描述字符串，也可以是 `paddle.CUDAPlace(0)` 之类的设备实例。
+
+代码示例
+::::::::::::
+COPY-FROM: paddle.cuda.set_device
diff --git a/docs/api/paddle/device/Overview_cn.rst b/docs/api/paddle/device/Overview_cn.rst
index 2807177a593..1998042a026 100644
--- a/docs/api/paddle/device/Overview_cn.rst
+++ b/docs/api/paddle/device/Overview_cn.rst
@@ -25,11 +25,20 @@ paddle.device 目录下包含 cuda 目录和 xpu 目录， cuda 目录中存放
     :header: "API 名称", "API 功能"
     :widths: 10, 30
 
+    " :ref:`device_count <cn_api_paddle_device_device_count>` ", "返回指定设备类型的可用设备数量"
+    " :ref:`empty_cache <cn_api_paddle_device_empty_cache>` ", "释放当前设备上所有未占用的缓存内存"
     " :ref:`get_all_custom_device_type <cn_api_paddle_device_get_all_custom_device_type>` ", "获得所有可用的自定义设备类型"
     " :ref:`get_all_device_type <cn_api_paddle_device_get_all_device_type>` ", "获得所有可用的设备类型"
     " :ref:`get_available_custom_device <cn_api_paddle_device_get_available_custom_device>` ", "获得所有可用的自定义设备"
     " :ref:`get_available_device <cn_api_paddle_device_get_available_device>` ", "获得所有可用的设备"
     " :ref:`get_cudnn_version <cn_api_paddle_device_get_cudnn_version>` ", "获得 cudnn 的版本"
+    " :ref:`get_device_properties <cn_api_paddle_device_get_device_properties>` ", "返回指定设备的属性"
+    " :ref:`max_memory_allocated <cn_api_paddle_device_max_memory_allocated>` ", "返回给定设备上分配给 Tensor 的内存峰值统计"
+    " :ref:`max_memory_reserved <cn_api_paddle_device_max_memory_reserved>` ", "返回给定设备上由内存分配器管理的内存峰值统计"
+    " :ref:`memory_allocated <cn_api_paddle_device_memory_allocated>` ", "返回给定设备上当前分配给 Tensor 的内存大小"
+    " :ref:`memory_reserved <cn_api_paddle_device_memory_reserved>` ", "返回给定设备上当前由内存分配器管理的内存大小"
+    " :ref:`reset_max_memory_allocated <cn_api_paddle_device_reset_max_memory_allocated>` ", "重置给定设备上分配给 Tensor 的内存峰值统计"
+    " :ref:`reset_max_memory_reserved <cn_api_paddle_device_reset_max_memory_reserved>` ", "重置给定设备上由内存分配器管理的内存峰值统计"
     " :ref:`set_device <cn_api_paddle_device_set_device>` ", "指定 OP 运行的全局设备"
     " :ref:`get_device <cn_api_paddle_device_get_device>` ", "获得 OP 运行的全局设备"
 
diff --git a/docs/api/paddle/device/device_count_cn.rst b/docs/api/paddle/device/device_count_cn.rst
new file mode 100644
index 00000000000..5b459978fc8
--- /dev/null
+++ b/docs/api/paddle/device/device_count_cn.rst
@@ -0,0 +1,22 @@
+.. _cn_api_paddle_device_device_count:
+
+device_count
+------------
+
+.. py:function:: paddle.device.device_count(device=None)
+
+返回指定设备类型的可用设备数量。
+
+参数
+::::::::::::
+
+    - **device** (paddle.CUDAPlace|paddle.CustomPlace|paddle.XPUPlace|str|int，可选) - 设备、设备 ID 或形如 ``gpu:x``、``xpu:x`` 或自定义设备名称的设备字符串。如果为 None，则返回当前设备类型的可用设备数量。默认值为 None。
+
+返回
+::::::::::::
+
+    int，指定设备类型的可用设备数量。
+
+代码示例
+::::::::::::
+COPY-FROM: paddle.device.device_count
diff --git a/docs/api/paddle/device/empty_cache_cn.rst b/docs/api/paddle/device/empty_cache_cn.rst
new file mode 100644
index 00000000000..16acc4b036b
--- /dev/null
+++ b/docs/api/paddle/device/empty_cache_cn.rst
@@ -0,0 +1,18 @@
+.. _cn_api_paddle_device_empty_cache:
+
+empty_cache
+-----------
+
+.. py:function:: paddle.device.empty_cache()
+
+释放当前设备上所有未占用的缓存内存。
+
+代码示例
+::::::::::::
+.. code-block:: python
+
+    import paddle
+
+    x = paddle.randn([1000, 1000])
+    del x
+    paddle.device.empty_cache()
diff --git a/docs/api/paddle/device/get_device_properties_cn.rst b/docs/api/paddle/device/get_device_properties_cn.rst
new file mode 100644
index 00000000000..58b606d4fbe
--- /dev/null
+++ b/docs/api/paddle/device/get_device_properties_cn.rst
@@ -0,0 +1,22 @@
+.. _cn_api_paddle_device_get_device_properties:
+
+get_device_properties
+---------------------
+
+.. py:function:: paddle.device.get_device_properties(device=None)
+
+返回指定设备的属性。
+
+参数
+::::::::::::
+
+    - **device** (paddle.CUDAPlace|paddle.CustomPlace|paddle.XPUPlace|str|int，可选) - 设备、设备 ID 或形如 ``gpu:x``、``xpu:x`` 或自定义设备名称的设备字符串。如果为 None，则返回当前设备的属性。默认值为 None。
+
+返回
+::::::::::::
+
+    String，指定设备的属性，包括设备名称、主要计算能力、次要计算能力、全局可用内存和设备上的多处理器数量。
+
+代码示例
+::::::::::::
+COPY-FROM: paddle.device.get_device_properties
diff --git a/docs/api/paddle/device/max_memory_allocated_cn.rst b/docs/api/paddle/device/max_memory_allocated_cn.rst
new file mode 100644
index 00000000000..ca83c00828c
--- /dev/null
+++ b/docs/api/paddle/device/max_memory_allocated_cn.rst
@@ -0,0 +1,28 @@
+.. _cn_api_paddle_device_max_memory_allocated:
+
+max_memory_allocated
+--------------------
+
+.. py:function:: paddle.device.max_memory_allocated(device=None)
+
+返回给定设备上分配给 Tensor 的内存峰值统计。
+
+参数
+::::::::::::
+
+    - **device** (paddle.CUDAPlace|paddle.CustomPlace|paddle.XPUPlace|str|int，可选) - 设备、设备 ID 或形如 ``gpu:x``、``xpu:x`` 或自定义设备名称的设备字符串。如果为 None，则返回当前设备的统计信息。默认值为 None。
+
+返回
+::::::::::::
+
+    int，给定设备上分配给 Tensor 的内存峰值统计，以字节为单位。
+
+代码示例
+::::::::::::
+.. code-block:: python
+
+    >>> import paddle
+    >>> paddle.device.max_memory_allocated('npu:0')
+    >>> paddle.device.max_memory_allocated('npu')
+    >>> paddle.device.max_memory_allocated(0)
+    >>> paddle.device.max_memory_allocated(Paddle.CustomPlace('npu',0))
diff --git a/docs/api/paddle/device/max_memory_reserved_cn.rst b/docs/api/paddle/device/max_memory_reserved_cn.rst
new file mode 100644
index 00000000000..d077f21fe08
--- /dev/null
+++ b/docs/api/paddle/device/max_memory_reserved_cn.rst
@@ -0,0 +1,28 @@
+.. _cn_api_paddle_device_max_memory_reserved:
+
+max_memory_reserved
+-------------------
+
+.. py:function:: paddle.device.max_memory_reserved(device=None)
+
+返回给定设备上由内存分配器管理的内存峰值统计。
+
+参数
+::::::::::::
+
+    - **device** (paddle.CUDAPlace|paddle.CustomPlace|paddle.XPUPlace|str|int，可选) - 设备、设备 ID 或形如 ``gpu:x``、``xpu:x`` 或自定义设备名称的设备字符串。如果为 None，则返回当前设备的统计信息。默认值为 None。
+
+返回
+::::::::::::
+
+    int，给定设备上由内存分配器管理的内存峰值统计，以字节为单位。
+
+代码示例
+::::::::::::
+.. code-block:: python
+
+    >>> import paddle
+    >>> paddle.device.max_memory_reserved('npu:0')
+    >>> paddle.device.max_memory_reserved('npu')
+    >>> paddle.device.max_memory_reserved(0)
+    >>> paddle.device.max_memory_reserved(Paddle.CustomPlace('npu',0))
diff --git a/docs/api/paddle/device/memory_allocated_cn.rst b/docs/api/paddle/device/memory_allocated_cn.rst
new file mode 100644
index 00000000000..1742bf44db9
--- /dev/null
+++ b/docs/api/paddle/device/memory_allocated_cn.rst
@@ -0,0 +1,31 @@
+.. _cn_api_paddle_device_memory_allocated:
+
+memory_allocated
+---------------
+
+.. py:function:: paddle.device.memory_allocated(device=None)
+
+返回给定设备上当前分配给 Tensor 的内存大小。
+
+.. note::
+    Paddle 中分配给 Tensor 的内存块大小会进行 256 字节对齐，因此可能大于 Tensor 实际需要的内存大小。例如，一个 shape 为[1]的 float32 类型 Tensor 会占用 256 字节的内存，即使存储一个 float32 类型数据实际只需要 4 字节。
+
+参数
+::::::::::::
+
+    - **device** (paddle.CUDAPlace|paddle.CustomPlace|paddle.XPUPlace|str|int，可选) - 设备、设备 ID 或形如 ``gpu:x``、``xpu:x`` 或自定义设备名称的设备字符串。如果为 None，则返回当前设备的统计信息。默认值为 None。
+
+返回
+::::::::::::
+
+    int，给定设备上当前分配给 Tensor 的内存大小，以字节为单位。
+
+代码示例
+::::::::::::
+.. code-block:: python
+
+    >>> import paddle
+    >>> paddle.device.memory_allocated('npu:0')
+    >>> paddle.device.memory_allocated('npu')
+    >>> paddle.device.memory_allocated(0)
+    >>> paddle.device.memory_allocated(Paddle.CustomPlace('npu',0))
diff --git a/docs/api/paddle/device/memory_reserved_cn.rst b/docs/api/paddle/device/memory_reserved_cn.rst
new file mode 100644
index 00000000000..61730add199
--- /dev/null
+++ b/docs/api/paddle/device/memory_reserved_cn.rst
@@ -0,0 +1,28 @@
+.. _cn_api_paddle_device_memory_reserved:
+
+memory_reserved
+---------------
+
+.. py:function:: paddle.device.memory_reserved(device=None)
+
+返回给定设备上当前由内存分配器管理的内存大小。
+
+参数
+::::::::::::
+
+    - **device** (paddle.CUDAPlace|paddle.CustomPlace|paddle.XPUPlace|str|int，可选) - 设备、设备 ID 或形如 ``gpu:x``、``xpu:x`` 或自定义设备名称的设备字符串。如果为 None，则返回当前设备的统计信息。默认值为 None。
+
+返回
+::::::::::::
+
+    int，给定设备上当前由内存分配器管理的内存大小，以字节为单位。
+
+代码示例
+::::::::::::
+.. code-block:: python
+
+    >>> import paddle
+    >>> paddle.device.memory_reserved('npu:0')
+    >>> paddle.device.memory_reserved('npu')
+    >>> paddle.device.memory_reserved(0)
+    >>> paddle.device.memory_reserved(Paddle.CustomPlace('npu',0))
diff --git a/docs/api/paddle/device/reset_max_memory_allocated_cn.rst b/docs/api/paddle/device/reset_max_memory_allocated_cn.rst
new file mode 100644
index 00000000000..4284f923739
--- /dev/null
+++ b/docs/api/paddle/device/reset_max_memory_allocated_cn.rst
@@ -0,0 +1,28 @@
+.. _cn_api_paddle_device_reset_max_memory_allocated:
+
+reset_max_memory_allocated
+-------------------------
+
+.. py:function:: paddle.device.reset_max_memory_allocated(device=None)
+
+重置给定设备上分配给 Tensor 的内存峰值统计。
+
+参数
+::::::::::::
+
+    - **device** (paddle.CUDAPlace|paddle.CustomPlace|paddle.XPUPlace|str|int，可选) - 设备、设备 ID 或形如 ``gpu:x``、``xpu:x`` 或自定义设备名称的设备字符串。如果为 None，则重置当前设备的统计信息。默认值为 None。
+
+返回
+::::::::::::
+
+    None
+
+代码示例
+::::::::::::
+.. code-block:: python
+
+    >>> import paddle
+    >>> paddle.device.reset_max_memory_allocated('npu:0')
+    >>> paddle.device.reset_max_memory_allocated('npu')
+    >>> paddle.device.reset_max_memory_allocated(0)
+    >>> paddle.device.reset_max_memory_allocated(Paddle.CustomPlace('npu',0))
diff --git a/docs/api/paddle/device/reset_max_memory_reserved_cn.rst b/docs/api/paddle/device/reset_max_memory_reserved_cn.rst
new file mode 100644
index 00000000000..75355a40e9a
--- /dev/null
+++ b/docs/api/paddle/device/reset_max_memory_reserved_cn.rst
@@ -0,0 +1,28 @@
+.. _cn_api_paddle_device_reset_max_memory_reserved:
+
+reset_max_memory_reserved
+-------------------------
+
+.. py:function:: paddle.device.reset_max_memory_reserved(device=None)
+
+重置给定设备上由内存分配器管理的内存峰值统计。
+
+参数
+::::::::::::
+
+    - **device** (paddle.CUDAPlace|paddle.CustomPlace|paddle.XPUPlace|str|int，可选) - 设备、设备 ID 或形如 ``gpu:x``、``xpu:x`` 或自定义设备名称的设备字符串。如果为 None，则重置当前设备的统计信息。默认值为 None。
+
+返回
+::::::::::::
+
+    None
+
+代码示例
+::::::::::::
+.. code-block:: python
+
+    >>> import paddle
+    >>> paddle.device.reset_max_memory_reserved('npu:0')
+    >>> paddle.device.reset_max_memory_reserved('npu')
+    >>> paddle.device.reset_max_memory_reserved(0)
+    >>> paddle.device.reset_max_memory_reserved(Paddle.CustomPlace('npu',0))