-
Notifications
You must be signed in to change notification settings - Fork 12.1k
Introduce Graph Profiler #9659
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Introduce Graph Profiler #9659
Conversation
@max-krasnyansky I am using the graph-profiler branch but I'm unsure how to trigger and get the profiling details. Any docs, commands or references would be appreciated. Thanks. |
6246824
to
e7e9a7f
Compare
Sorry for the delay. Here is how to build (arm64-ubuntu)
And here is how to run
This will get you the output I included in the PR
|
d4051c8
to
a362c74
Compare
Hi, I am also trying to find how to do profile properly with llama.cpp. In my case, I would like to know the performance beyond the node level. For example, I would like to know the aggregated time of all nodes generated by |
I think a good approach can be that for each |
a362c74
to
ca40774
Compare
I'm thinking for that it might make sense to insert dummy graph nodes that record profiling data. |
Here is an attempt at reintroducing the original whole-graph profiler (LLAMA_PERF) with some additional features.
Not ready for the merge into master but useful for profiling different models (on CPU).
Features:
Known issues:
ggml_init_param.graph_profile
or it'll be moved into the backend paramsIf there is interest it should be easy to extend to other backends where they could update per-node/per-thread
ggml_profile_timing
data (they'd have to collect it on the accelerator and then export into this common format.See original PR #9647 for additional details.
Example of the terminal output
Same example in rendered MarkDown