You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: _get_started/pytorch.md
+4-4
Original file line number
Diff line number
Diff line change
@@ -283,7 +283,7 @@ The minifier automatically reduces the issue you are seeing to a small snippet o
283
283
284
284
If you are not seeing the speedups that you expect, then we have the **torch.\_dynamo.explain** tool that explains which parts of your code induced what we call “graph breaks”. Graph breaks generally hinder the compiler from speeding up the code, and reducing the number of graph breaks likely will speed up your code (up to some limit of diminishing returns).
285
285
286
-
You can read about these and more in our [troubleshooting guide](https://pytorch.org/docs/stable/dynamo/troubleshooting.html).
286
+
You can read about these and more in our [troubleshooting guide](https://pytorch.org/docs/stable/torch.compiler_troubleshooting.html).
287
287
288
288
### Dynamic Shapes
289
289
@@ -496,7 +496,7 @@ In 2.0, if you wrap your model in `model = torch.compile(model)`, your model goe
496
496
3. Graph compilation, where the kernels call their corresponding low-level device-specific operations.
497
497
498
498
9. **What new components does PT2.0 add to PT?**
499
-
- **TorchDynamo** generates FX Graphs from Python bytecode. It maintains the eager-mode capabilities using [guards](https://pytorch.org/docs/stable/dynamo/guards-overview.html#caching-and-guards-overview) to ensure the generated graphs are valid ([read more](https://dev-discuss.pytorch.org/t/torchdynamo-an-experiment-in-dynamic-python-bytecode-transformation/361))
499
+
- **TorchDynamo** generates FX Graphs from Python bytecode. It maintains the eager-mode capabilities using [guards](https://pytorch.org/docs/stable/torch.compiler_guards_overview.html#caching-and-guards-overview) to ensure the generated graphs are valid ([read more](https://dev-discuss.pytorch.org/t/torchdynamo-an-experiment-in-dynamic-python-bytecode-transformation/361))
500
500
- **AOTAutograd** to generate the backward graph corresponding to the forward graph captured by TorchDynamo ([read more](https://dev-discuss.pytorch.org/t/torchdynamo-update-6-training-support-with-aotautograd/570)).
501
501
- **PrimTorch** to decompose complicated PyTorch operations into simpler and more elementary ops ([read more](https://dev-discuss.pytorch.org/t/tracing-with-primitives-update-2/645)).
502
502
- **\[Backend]** Backends integrate with TorchDynamo to compile the graph into IR that can run on accelerators. For example, **TorchInductor** compiles the graph to either **Triton** for GPU execution or **OpenMP** for CPU execution ([read more](https://dev-discuss.pytorch.org/t/torchinductor-a-pytorch-native-compiler-with-define-by-run-ir-and-symbolic-shapes/747)).
@@ -511,10 +511,10 @@ DDP and FSDP in Compiled mode can run up to 15% faster than Eager-Mode in FP32
511
511
The [PyTorch Developers forum](http://dev-discuss.pytorch.org/) is the best place to learn about 2.0 components directly from the developers who build them.
512
512
513
513
13. **Help my code is running slower with 2.0’s Compiled Mode!**
514
-
The most likely reason for performance hits is too many graph breaks. For instance, something innocuous as a print statement in your model’s forward triggers a graph break. We have ways to diagnose these - read more [here](https://pytorch.org/docs/stable/dynamo/faq.html#why-am-i-not-seeing-speedups).
514
+
The most likely reason for performance hits is too many graph breaks. For instance, something innocuous as a print statement in your model’s forward triggers a graph break. We have ways to diagnose these - read more [here](https://pytorch.org/docs/stable/torch.compiler_faq.html#why-am-i-not-seeing-speedups).
515
515
516
516
14. **My previously-running code is crashing with 2.0’s Compiled Mode! How do I debug it?**
517
-
Here are some techniques to triage where your code might be failing, and printing helpful logs: [https://pytorch.org/docs/stable/dynamo/faq.html#why-is-my-code-crashing](https://pytorch.org/docs/stable/dynamo/faq.html#why-is-my-code-crashing).
517
+
Here are some techniques to triage where your code might be failing, and printing helpful logs: [https://pytorch.org/docs/stable/torch.compiler_faq.html#why-is-my-code-crashing](https://pytorch.org/docs/stable/torch.compiler_faq.html#why-is-my-code-crashing).
Copy file name to clipboardExpand all lines: _posts/2023-04-14-accelerated-generative-diffusion-models.md
+5-5
Original file line number
Diff line number
Diff line change
@@ -156,9 +156,9 @@ model = torch.compile(model)
156
156
```
157
157
158
158
159
-
PyTorch compiler then turns Python code into a set of instructions which can be executed efficiently without Python overhead. The compilation happens dynamically the first time the code is executed. With the default behavior, under the hood PyTorch utilized [TorchDynamo](https://pytorch.org/docs/master/dynamo/index.html) to compile the code and [TorchInductor](https://dev-discuss.pytorch.org/t/torchinductor-a-pytorch-native-compiler-with-define-by-run-ir-and-symbolic-shapes/747) to further optimize it. See [this tutorial](https://pytorch.org/tutorials/intermediate/dynamo_tutorial.html) for more details.
159
+
PyTorch compiler then turns Python code into a set of instructions which can be executed efficiently without Python overhead. The compilation happens dynamically the first time the code is executed. With the default behavior, under the hood PyTorch utilized [TorchDynamo](https://pytorch.org/docs/stable/torch.compiler) to compile the code and [TorchInductor](https://dev-discuss.pytorch.org/t/torchinductor-a-pytorch-native-compiler-with-define-by-run-ir-and-symbolic-shapes/747) to further optimize it. See [this tutorial](https://pytorch.org/tutorials/intermediate/torch_compile_tutorial.html) for more details.
160
160
161
-
Although the one-liner above is enough for compilation, certain modifications in the code can squeeze a larger speedup. In particular, one should avoid so-called graph breaks - places in the code which PyTorch can’t compile. As opposed to previous PyTorch compilation approaches (like TorchScript), PyTorch 2 compiler doesn’t break in this case. Instead it falls back on eager execution - so the code runs, but with reduced performance. We introduced a few minor changes to the model code to get rid of graph breaks. This included eliminating functions from libraries not supported by the compiler, such as `inspect.isfunction` and `einops.rearrange`. See this [doc](https://pytorch.org/docs/master/dynamo/faq.html#identifying-the-cause-of-a-graph-break) to learn more about graph breaks and how to eliminate them.
161
+
Although the one-liner above is enough for compilation, certain modifications in the code can squeeze a larger speedup. In particular, one should avoid so-called graph breaks - places in the code which PyTorch can’t compile. As opposed to previous PyTorch compilation approaches (like TorchScript), PyTorch 2 compiler doesn’t break in this case. Instead it falls back on eager execution - so the code runs, but with reduced performance. We introduced a few minor changes to the model code to get rid of graph breaks. This included eliminating functions from libraries not supported by the compiler, such as `inspect.isfunction` and `einops.rearrange`. See this [doc](https://pytorch.org/docs/stable/torch.compiler_faq.html#identifying-the-cause-of-a-graph-break) to learn more about graph breaks and how to eliminate them.
162
162
163
163
Theoretically, one can apply `torch.compile `on the whole diffusion sampling loop. However, in practice it is enough to just compile the U-Net. The reason is that `torch.compile` doesn’t yet have a loop analyzer and would recompile the code for each iteration of the sampling loop. Moreover, compiled sampler code is likely to generate graph breaks - so one would need to adjust it if one wants to get a good performance from the compiled version.
164
164
@@ -503,9 +503,9 @@ See if you can increase performance of open source diffusion models using the me
503
503
504
504
* PyTorch 2.0 overview, which has a lot of information on `torch.compile:`[https://pytorch.org/get-started/pytorch-2.0/](https://pytorch.org/get-started/pytorch-2.0/)
505
505
* Tutorial on `torch.compile`: [https://pytorch.org/tutorials/intermediate/torch_compile_tutorial.html](https://pytorch.org/tutorials/intermediate/torch_compile_tutorial.html)
506
-
* General compilation troubleshooting: [https://pytorch.org/docs/master/dynamo/troubleshooting.html](https://pytorch.org/docs/master/dynamo/troubleshooting.html)
507
-
* Details on graph breaks: [https://pytorch.org/docs/master/dynamo/faq.html#identifying-the-cause-of-a-graph-break](https://pytorch.org/docs/master/dynamo/faq.html#identifying-the-cause-of-a-graph-break)
508
-
* Details on guards: [https://pytorch.org/docs/master/dynamo/guards-overview.html](https://pytorch.org/docs/master/dynamo/guards-overview.html)
506
+
* General compilation troubleshooting: [https://pytorch.org/docs/stable/torch.compiler_troubleshooting.html](https://pytorch.org/docs/stable/torch.compiler_troubleshooting.html)
507
+
* Details on graph breaks: [https://pytorch.org/docs/stable/torch.compiler_faq.html#identifying-the-cause-of-a-graph-break](https://pytorch.org/docs/stable/torch.compiler_faq.html#identifying-the-cause-of-a-graph-break)
508
+
* Details on guards: [https://pytorch.org/docs/stable/torch.compiler_guards_overview.html](https://pytorch.org/docs/stable/torch.compiler_guards_overview.html)
509
509
* Video deep dive on TorchDynamo [https://www.youtube.com/watch?v=egZB5Uxki0I](https://www.youtube.com/watch?v=egZB5Uxki0I)
510
510
* Tutorial on optimized attention in PyTorch 1.12: [https://pytorch.org/tutorials/beginner/bettertransformer_tutorial.html](https://pytorch.org/tutorials/beginner/bettertransformer_tutorial.html)
Copy file name to clipboardExpand all lines: _posts/2023-06-28-path-achieve-low-inference-latency.md
+1-1
Original file line number
Diff line number
Diff line change
@@ -99,7 +99,7 @@ LLMs have a few properties that make them challenging for compiler optimizations
99
99
100
100
## Inference Tech Stack in PyTorch/XLA
101
101
102
-
Our goal is to offer the AI community a high performance inference stack. PyTorch/XLA integrates with [TorchDynamo](https://pytorch.org/docs/stable/dynamo/index.html), [PjRt](https://pytorch.org/blog/pytorch-2.0-xla/#pjrt-runtime-beta), [OpenXLA](https://pytorch.org/blog/pytorch-2.0-xla-path-forward/), and various model parallelism schemes. TorchDynamo eliminates tracing overhead at runtime, PjRt enables efficient host-device communication; PyTorch/XLA traceable collectives enable model and data parallelism on LLaMA via [TorchDynamo](https://pytorch.org/docs/stable/dynamo/index.html). To try our results, please use our custom [torch](https://storage.googleapis.com/tpu-pytorch/wheels/tpuvm/torch-nightly+20230422-cp38-cp38-linux_x86_64.whl), [torch-xla](https://storage.googleapis.com/tpu-pytorch/wheels/tpuvm/torch_xla-nightly+20230422-cp38-cp38-linux_x86_64.whl) wheels to reproduce our [LLaMA inference solution](https://github.com/pytorch-tpu/llama/tree/blog). PyTorch/XLA 2.1 will support the features discussed in this post by default.
102
+
Our goal is to offer the AI community a high performance inference stack. PyTorch/XLA integrates with [TorchDynamo](https://pytorch.org/docs/stable/torch.compiler), [PjRt](https://pytorch.org/blog/pytorch-2.0-xla/#pjrt-runtime-beta), [OpenXLA](https://pytorch.org/blog/pytorch-2.0-xla-path-forward/), and various model parallelism schemes. TorchDynamo eliminates tracing overhead at runtime, PjRt enables efficient host-device communication; PyTorch/XLA traceable collectives enable model and data parallelism on LLaMA via [TorchDynamo](https://pytorch.org/docs/stable/torch.compiler). To try our results, please use our custom [torch](https://storage.googleapis.com/tpu-pytorch/wheels/tpuvm/torch-nightly+20230422-cp38-cp38-linux_x86_64.whl), [torch-xla](https://storage.googleapis.com/tpu-pytorch/wheels/tpuvm/torch_xla-nightly+20230422-cp38-cp38-linux_x86_64.whl) wheels to reproduce our [LLaMA inference solution](https://github.com/pytorch-tpu/llama/tree/blog). PyTorch/XLA 2.1 will support the features discussed in this post by default.
<li><p>You can pass a string containing your backend function’s name to <codeclass="docutils literal notranslate"><spanclass="pre">torch.compile</span></code> instead of the function itself,
531
531
for example, <codeclass="docutils literal notranslate"><spanclass="pre">torch.compile(model,</span><spanclass="pre">backend="my_compiler")</span></code>.</p></li>
532
-
<li><p>It is required for use with the <aclass="reference external" href="https://pytorch.org/docs/master/dynamo/troubleshooting.html">minifier</a>. Any generated
532
+
<li><p>It is required for use with the <aclass="reference external" href="https://pytorch.org/docs/stable/torch.compiler_troubleshooting.html">minifier</a>. Any generated
533
533
code from the minifier must call your code that registers your backend function, typically through an <codeclass="docutils literal notranslate"><spanclass="pre">import</span></code> statement.</p></li>
0 commit comments