pytorch
diff --git a/‎_get_started/pytorch.md
Lines changed: 4 additions & 4 deletions b/‎_get_started/pytorch.md
Lines changed: 4 additions & 4 deletions
diff --git a/‎_posts/2023-04-14-accelerated-generative-diffusion-models.md
Lines changed: 5 additions & 5 deletions b/‎_posts/2023-04-14-accelerated-generative-diffusion-models.md
Lines changed: 5 additions & 5 deletions
diff --git a/‎_posts/2023-06-28-path-achieve-low-inference-latency.md
Lines changed: 1 addition & 1 deletion b/‎_posts/2023-06-28-path-achieve-low-inference-latency.md
Lines changed: 1 addition & 1 deletion
diff --git a/‎docs/2.0/_sources/dynamo/custom-backends.rst.txt
Lines changed: 1 addition & 1 deletion b/‎docs/2.0/_sources/dynamo/custom-backends.rst.txt
Lines changed: 1 addition & 1 deletion
diff --git a/‎docs/2.0/dynamo/custom-backends.html
Lines changed: 1 addition & 1 deletion b/‎docs/2.0/dynamo/custom-backends.html
Lines changed: 1 addition & 1 deletion
diff --git a/‎docs/2.0/dynamo/faq.html
Lines changed: 1 addition & 1 deletion b/‎docs/2.0/dynamo/faq.html
Lines changed: 1 addition & 1 deletion
diff --git a/‎docs/2.0/dynamo/guards-overview.html
Lines changed: 1 addition & 1 deletion b/‎docs/2.0/dynamo/guards-overview.html
Lines changed: 1 addition & 1 deletion
diff --git a/‎docs/2.0/dynamo/index.html
Lines changed: 1 addition & 1 deletion b/‎docs/2.0/dynamo/index.html
Lines changed: 1 addition & 1 deletion
diff --git a/‎docs/2.0/dynamo/troubleshooting.html
Lines changed: 1 addition & 1 deletion b/‎docs/2.0/dynamo/troubleshooting.html
Lines changed: 1 addition & 1 deletion
@@ -283,7 +283,7 @@ The minifier automatically reduces the issue you are seeing to a small snippet o
 
 If you are not seeing the speedups that you expect, then we have the **torch.\_dynamo.explain** tool that explains which parts of your code induced what we call “graph breaks”. Graph breaks generally hinder the compiler from speeding up the code, and reducing the number of graph breaks likely will speed up your code (up to some limit of diminishing returns).
 
-You can read about these and more in our [troubleshooting guide](https://pytorch.org/docs/stable/dynamo/troubleshooting.html).
+You can read about these and more in our [troubleshooting guide](https://pytorch.org/docs/stable/torch.compiler_troubleshooting.html).
 
 ### Dynamic Shapes
 
@@ -496,7 +496,7 @@ In 2.0, if you wrap your model in `model = torch.compile(model)`, your model goe
     3.  Graph compilation, where the kernels call their corresponding low-level device-specific operations.  
 
 9. **What new components does PT2.0 add to PT?**  
-    - **TorchDynamo** generates FX Graphs from Python bytecode. It maintains the eager-mode capabilities using [guards](https://pytorch.org/docs/stable/dynamo/guards-overview.html#caching-and-guards-overview) to ensure the generated graphs are valid ([read more](https://dev-discuss.pytorch.org/t/torchdynamo-an-experiment-in-dynamic-python-bytecode-transformation/361))  
+    - **TorchDynamo** generates FX Graphs from Python bytecode. It maintains the eager-mode capabilities using [guards](https://pytorch.org/docs/stable/torch.compiler_guards_overview.html#caching-and-guards-overview) to ensure the generated graphs are valid ([read more](https://dev-discuss.pytorch.org/t/torchdynamo-an-experiment-in-dynamic-python-bytecode-transformation/361))  
     - **AOTAutograd** to generate the backward graph corresponding to the forward graph captured by TorchDynamo ([read more](https://dev-discuss.pytorch.org/t/torchdynamo-update-6-training-support-with-aotautograd/570)).  
     - **PrimTorch** to decompose complicated PyTorch operations into simpler and more elementary ops ([read more](https://dev-discuss.pytorch.org/t/tracing-with-primitives-update-2/645)).  
     - **\[Backend]** Backends integrate with TorchDynamo to compile the graph into IR that can run on accelerators. For example, **TorchInductor** compiles the graph to either **Triton** for GPU execution or **OpenMP** for CPU execution ([read more](https://dev-discuss.pytorch.org/t/torchinductor-a-pytorch-native-compiler-with-define-by-run-ir-and-symbolic-shapes/747)).  
@@ -511,10 +511,10 @@ DDP and FSDP in Compiled mode  can run up to 15% faster than Eager-Mode in FP32
 The [PyTorch Developers forum](http://dev-discuss.pytorch.org/) is the best place to learn about 2.0 components directly from the developers who build them.  
 
 13. **Help my code is running slower with 2.0’s Compiled Mode!**  
-The most likely reason for performance hits is too many graph breaks. For instance, something innocuous as a print statement in your model’s forward triggers a graph break. We have ways to diagnose these  - read more [here](https://pytorch.org/docs/stable/dynamo/faq.html#why-am-i-not-seeing-speedups).  
+The most likely reason for performance hits is too many graph breaks. For instance, something innocuous as a print statement in your model’s forward triggers a graph break. We have ways to diagnose these  - read more [here](https://pytorch.org/docs/stable/torch.compiler_faq.html#why-am-i-not-seeing-speedups).  
 
 14. **My previously-running code is crashing with 2.0’s Compiled Mode! How do I debug it?**  
-Here are some techniques to triage where your code might be failing, and printing helpful logs: [https://pytorch.org/docs/stable/dynamo/faq.html#why-is-my-code-crashing](https://pytorch.org/docs/stable/dynamo/faq.html#why-is-my-code-crashing).  
+Here are some techniques to triage where your code might be failing, and printing helpful logs: [https://pytorch.org/docs/stable/torch.compiler_faq.html#why-is-my-code-crashing](https://pytorch.org/docs/stable/torch.compiler_faq.html#why-is-my-code-crashing).  
 
 ## Ask the Engineers: 2.0 Live Q&A Series
 
 
@@ -156,9 +156,9 @@ model = torch.compile(model)
 ```
 
 
-PyTorch compiler then turns Python code into a set of instructions which can be executed efficiently without Python overhead. The compilation happens dynamically the first time the code is executed. With the default behavior, under the hood PyTorch utilized [TorchDynamo](https://pytorch.org/docs/master/dynamo/index.html) to compile the code and [TorchInductor](https://dev-discuss.pytorch.org/t/torchinductor-a-pytorch-native-compiler-with-define-by-run-ir-and-symbolic-shapes/747) to further optimize it. See [this tutorial](https://pytorch.org/tutorials/intermediate/dynamo_tutorial.html) for more details.
+PyTorch compiler then turns Python code into a set of instructions which can be executed efficiently without Python overhead. The compilation happens dynamically the first time the code is executed. With the default behavior, under the hood PyTorch utilized [TorchDynamo](https://pytorch.org/docs/stable/torch.compiler) to compile the code and [TorchInductor](https://dev-discuss.pytorch.org/t/torchinductor-a-pytorch-native-compiler-with-define-by-run-ir-and-symbolic-shapes/747) to further optimize it. See [this tutorial](https://pytorch.org/tutorials/intermediate/torch_compile_tutorial.html) for more details.
 
-Although the one-liner above is enough for compilation, certain modifications in the code can squeeze a larger speedup. In particular, one should avoid so-called graph breaks - places in the code which PyTorch can’t compile. As opposed to previous PyTorch compilation approaches (like TorchScript), PyTorch 2 compiler doesn’t break in this case. Instead it falls back on eager execution - so the code runs, but with reduced performance. We introduced a few minor changes to the model code to get rid of graph breaks. This included eliminating functions from libraries not supported by the compiler, such as `inspect.isfunction` and `einops.rearrange`. See this [doc](https://pytorch.org/docs/master/dynamo/faq.html#identifying-the-cause-of-a-graph-break) to learn more about graph breaks and how to eliminate them.
+Although the one-liner above is enough for compilation, certain modifications in the code can squeeze a larger speedup. In particular, one should avoid so-called graph breaks - places in the code which PyTorch can’t compile. As opposed to previous PyTorch compilation approaches (like TorchScript), PyTorch 2 compiler doesn’t break in this case. Instead it falls back on eager execution - so the code runs, but with reduced performance. We introduced a few minor changes to the model code to get rid of graph breaks. This included eliminating functions from libraries not supported by the compiler, such as `inspect.isfunction` and `einops.rearrange`. See this [doc](https://pytorch.org/docs/stable/torch.compiler_faq.html#identifying-the-cause-of-a-graph-break) to learn more about graph breaks and how to eliminate them.
 
 Theoretically, one can apply `torch.compile `on the whole diffusion sampling loop. However, in practice it is enough to just compile the U-Net. The reason is that `torch.compile` doesn’t yet have a loop analyzer and would recompile the code for each iteration of the sampling loop. Moreover, compiled sampler code is likely to generate graph breaks - so one would need to adjust it if one wants to get a good performance from the compiled version.
 
@@ -503,9 +503,9 @@ See if you can increase performance of open source diffusion models using the me
 
 * PyTorch 2.0 overview, which has a lot of information on `torch.compile:` [https://pytorch.org/get-started/pytorch-2.0/](https://pytorch.org/get-started/pytorch-2.0/) 
 * Tutorial on `torch.compile`: [https://pytorch.org/tutorials/intermediate/torch_compile_tutorial.html](https://pytorch.org/tutorials/intermediate/torch_compile_tutorial.html)
-* General compilation troubleshooting: [https://pytorch.org/docs/master/dynamo/troubleshooting.html](https://pytorch.org/docs/master/dynamo/troubleshooting.html)
-* Details on graph breaks: [https://pytorch.org/docs/master/dynamo/faq.html#identifying-the-cause-of-a-graph-break](https://pytorch.org/docs/master/dynamo/faq.html#identifying-the-cause-of-a-graph-break)
-* Details on guards: [https://pytorch.org/docs/master/dynamo/guards-overview.html](https://pytorch.org/docs/master/dynamo/guards-overview.html)
+* General compilation troubleshooting: [https://pytorch.org/docs/stable/torch.compiler_troubleshooting.html](https://pytorch.org/docs/stable/torch.compiler_troubleshooting.html)
+* Details on graph breaks: [https://pytorch.org/docs/stable/torch.compiler_faq.html#identifying-the-cause-of-a-graph-break](https://pytorch.org/docs/stable/torch.compiler_faq.html#identifying-the-cause-of-a-graph-break)
+* Details on guards: [https://pytorch.org/docs/stable/torch.compiler_guards_overview.html](https://pytorch.org/docs/stable/torch.compiler_guards_overview.html)
 * Video deep dive on TorchDynamo [https://www.youtube.com/watch?v=egZB5Uxki0I](https://www.youtube.com/watch?v=egZB5Uxki0I) 
 * Tutorial on optimized attention in PyTorch 1.12: [https://pytorch.org/tutorials/beginner/bettertransformer_tutorial.html](https://pytorch.org/tutorials/beginner/bettertransformer_tutorial.html) 
 
 
@@ -99,7 +99,7 @@ LLMs have a few properties that make them challenging for compiler optimizations
 
 ## Inference Tech Stack in PyTorch/XLA
 
-Our goal is to offer the AI community a high performance inference stack. PyTorch/XLA integrates with [TorchDynamo](https://pytorch.org/docs/stable/dynamo/index.html), [PjRt](https://pytorch.org/blog/pytorch-2.0-xla/#pjrt-runtime-beta), [OpenXLA](https://pytorch.org/blog/pytorch-2.0-xla-path-forward/), and various model parallelism schemes. TorchDynamo eliminates tracing overhead at runtime, PjRt enables efficient host-device communication; PyTorch/XLA traceable collectives enable model and data parallelism on LLaMA via [TorchDynamo](https://pytorch.org/docs/stable/dynamo/index.html). To try our results, please use our custom [torch](https://storage.googleapis.com/tpu-pytorch/wheels/tpuvm/torch-nightly+20230422-cp38-cp38-linux_x86_64.whl), [torch-xla](https://storage.googleapis.com/tpu-pytorch/wheels/tpuvm/torch_xla-nightly+20230422-cp38-cp38-linux_x86_64.whl) wheels to reproduce our [LLaMA inference solution](https://github.com/pytorch-tpu/llama/tree/blog). PyTorch/XLA 2.1 will support the features discussed in this post by default.
+Our goal is to offer the AI community a high performance inference stack. PyTorch/XLA integrates with [TorchDynamo](https://pytorch.org/docs/stable/torch.compiler), [PjRt](https://pytorch.org/blog/pytorch-2.0-xla/#pjrt-runtime-beta), [OpenXLA](https://pytorch.org/blog/pytorch-2.0-xla-path-forward/), and various model parallelism schemes. TorchDynamo eliminates tracing overhead at runtime, PjRt enables efficient host-device communication; PyTorch/XLA traceable collectives enable model and data parallelism on LLaMA via [TorchDynamo](https://pytorch.org/docs/stable/torch.compiler). To try our results, please use our custom [torch](https://storage.googleapis.com/tpu-pytorch/wheels/tpuvm/torch-nightly+20230422-cp38-cp38-linux_x86_64.whl), [torch-xla](https://storage.googleapis.com/tpu-pytorch/wheels/tpuvm/torch_xla-nightly+20230422-cp38-cp38-linux_x86_64.whl) wheels to reproduce our [LLaMA inference solution](https://github.com/pytorch-tpu/llama/tree/blog). PyTorch/XLA 2.1 will support the features discussed in this post by default.
 
 
 ## Parallel Computing
 
@@ -84,7 +84,7 @@ Registration serves two purposes:
 
 * You can pass a string containing your backend function's name to ``torch.compile`` instead of the function itself,
   for example, ``torch.compile(model, backend="my_compiler")``.
-* It is required for use with the `minifier <https://pytorch.org/docs/master/dynamo/troubleshooting.html>`__. Any generated
+* It is required for use with the `minifier <https://pytorch.org/docs/stable/torch.compiler_troubleshooting.html>`__. Any generated
   code from the minifier must call your code that registers your backend function, typically through an ``import`` statement.
 
 Custom Backends after AOTAutograd
 
@@ -529,7 +529,7 @@ <h2>Registering Custom Backends<a class="headerlink" href="#registering-custom-b
 <ul class="simple">
 <li><p>You can pass a string containing your backend function’s name to <code class="docutils literal notranslate"><span class="pre">torch.compile</span></code> instead of the function itself,
 for example, <code class="docutils literal notranslate"><span class="pre">torch.compile(model,</span> <span class="pre">backend=&quot;my_compiler&quot;)</span></code>.</p></li>
-<li><p>It is required for use with the <a class="reference external" href="https://pytorch.org/docs/master/dynamo/troubleshooting.html">minifier</a>. Any generated
+<li><p>It is required for use with the <a class="reference external" href="https://pytorch.org/docs/stable/torch.compiler_troubleshooting.html">minifier</a>. Any generated
 code from the minifier must call your code that registers your backend function, typically through an <code class="docutils literal notranslate"><span class="pre">import</span></code> statement.</p></li>
 </ul>
 </section>
 
@@ -18,7 +18,7 @@
 
 
 
-    <link rel="canonical" href="https://pytorch.org/docs/stable/dynamo/faq.html"/>
+    <link rel="canonical" href="https://pytorch.org/docs/stable/torch.compiler_faq.html"/>
 
 
 
 
@@ -18,7 +18,7 @@
 
 
 
-    <link rel="canonical" href="https://pytorch.org/docs/stable/dynamo/guards-overview.html"/>
+    <link rel="canonical" href="https://pytorch.org/docs/stable/torch.compiler_guards_overview.html"/>
 
 
 
 
@@ -18,7 +18,7 @@
 
 
 
-    <link rel="canonical" href="https://pytorch.org/docs/stable/dynamo/index.html"/>
+    <link rel="canonical" href="https://pytorch.org/docs/stable/torch.compiler"/>
 
 
 
 
@@ -18,7 +18,7 @@
 
 
 
-    <link rel="canonical" href="https://pytorch.org/docs/stable/dynamo/troubleshooting.html"/>
+    <link rel="canonical" href="https://pytorch.org/docs/stable/torch.compiler_troubleshooting.html"/>
Original file line number	Diff line number	Diff line change
`@@ -18,7 +18,7 @@`
`18`	`18`
`19`	`19`
`20`	`20`
`21`		`- <link rel="canonical" href="https://pytorch.org/docs/stable/dynamo/faq.html"/>`
	`21`	`+ <link rel="canonical" href="https://pytorch.org/docs/stable/torch.compiler_faq.html"/>`
`22`	`22`
`23`	`23`
`24`	`24`