-
Notifications
You must be signed in to change notification settings - Fork 12.1k
[CANN]:Replace aclrtMemsetSync with InplaceZero operator for zero tensor creation #14002
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
…te zero tensors more efficiently and consistently within the computation graph
Thanks for your contribute! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The format check is failed. Please correct it according to test output.
ggml/src/ggml-cann/aclnn_ops.cpp
Outdated
@@ -834,6 +837,7 @@ static aclTensor* aclnn_values(ggml_backend_cann_context& ctx, void* buffer, | |||
float value = 1.0f) { | |||
aclTensor* acl_tensor = | |||
aclnn_zero(ctx, buffer, n_bytes, ne, dims, type, type_size); | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Revert this change if unnecessary.
ggml/src/ggml-cann/aclnn_ops.cpp
Outdated
@@ -67,6 +67,7 @@ | |||
#include <aclnnop/aclnn_pow.h> | |||
#include <aclnnop/aclnn_grouped_matmul_v2.h> | |||
#include <aclnnop/aclnn_fused_infer_attention_score_v2.h> | |||
#include "aclnnop/aclnn_zero.h" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
#include <aclnnop/aclnn_zero.h>
ggml/src/ggml-cann/aclnn_ops.cpp
Outdated
aclTensor* zero = | ||
ggml_cann_create_tensor(buffer, type, type_size, ne, nb, dims); | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
unused parameter ‘n_bytes’
…te zero tensors more efficiently and consistently within the computation graph
Thanks for your feedback! I've corrected the formatting issues as suggested and updated the PR accordingly. Please have a look when you get a chance. |
In performance tests on the Atlas 300V NPU, we observed that the aclrtMemsetSync operation introduced significant latency when creating zero tensors.
To address this, we replaced aclrtMemsetSync with the InplaceZero operator. This change allows zero tensors to be created more efficiently and consistently within the computation graph, avoiding costly synchronization and improving execution performance.
This also helps ensure better integration with graph-level optimizations and memory management.
Make sure to read the contributing guidelines before submitting a PR