@@ -11,10 +11,15 @@ will have a single hidden layer, and will be trained with gradient descent to
1111fit random data by minimizing the Euclidean distance between the network output
1212and the true output.
1313
14+ ** NOTE:** These examples have been update for PyTorch 0.4, which made several
15+ major changes to the core PyTorch API. Most notably, prior to 0.4 Tensors had
16+ to be wrapped in Variable objects to use autograd; this functionality has now
17+ been added directly to Tensors, and Variables are now deprecated.
18+
1419### Table of Contents
1520- <a href =' #warm-up-numpy ' >Warm-up: numpy</a >
1621- <a href =' #pytorch-tensors ' >PyTorch: Tensors</a >
17- - <a href =' #pytorch-variables-and- autograd ' >PyTorch: Variables and autograd </a >
22+ - <a href =' #pytorch-autograd ' >PyTorch: Autograd </a >
1823- <a href =' #pytorch-defining-new-autograd-functions ' >PyTorch: Defining new autograd functions</a >
1924- <a href =' #tensorflow-static-graphs ' >TensorFlow: Static Graphs</a >
2025- <a href =' #pytorch-nn ' >PyTorch: nn</a >
@@ -46,24 +51,24 @@ unfortunately numpy won't be enough for modern deep learning.
4651
4752Here we introduce the most fundamental PyTorch concept: the ** Tensor** . A PyTorch
4853Tensor is conceptually identical to a numpy array: a Tensor is an n-dimensional
49- array, and PyTorch provides many functions for operating on these Tensors. Like
50- numpy arrays, PyTorch Tensors do not know anything about deep learning or
51- computational graphs or gradients; they are a generic tool for scientific
54+ array, and PyTorch provides many functions for operating on these Tensors.
55+ Any computation you might want to perform with numpy can also be accomplished
56+ with PyTorch Tensors; you should think of them as a generic tool for scientific
5257computing.
5358
5459However unlike numpy, PyTorch Tensors can utilize GPUs to accelerate their
55- numeric computations. To run a PyTorch Tensor on GPU, you simply need to cast it
56- to a new datatype .
60+ numeric computations. To run a PyTorch Tensor on GPU, you use the ` device `
61+ argument when constructing a Tensor to place the Tensor on a GPU .
5762
5863Here we use PyTorch Tensors to fit a two-layer network to random data. Like the
59- numpy example above we need to manually implement the forward and backward
60- passes through the network:
64+ numpy example above we manually implement the forward and backward
65+ passes through the network, using operations on PyTorch Tensors :
6166
6267``` python
6368:INCLUDE tensor/ two_layer_net_tensor.py
6469```
6570
66- ## PyTorch: Variables and autograd
71+ ## PyTorch: Autograd
6772
6873In the above examples, we had to manually implement both the forward and
6974backward passes of our neural network. Manually implementing the backward pass
@@ -79,18 +84,21 @@ When using autograd, the forward pass of your network will define a
7984functions that produce output Tensors from input Tensors. Backpropagating through
8085this graph then allows you to easily compute gradients.
8186
82- This sounds complicated, it's pretty simple to use in practice. We wrap our
83- PyTorch Tensors in ** Variable** objects; a Variable represents a node in a
84- computational graph. If ` x ` is a Variable then ` x.data ` is a Tensor, and
85- ` x.grad ` is another Variable holding the gradient of ` x ` with respect to some
86- scalar value.
87-
88- PyTorch Variables have the same API as PyTorch Tensors: (almost) any operation
89- that you can perform on a Tensor also works on Variables; the difference is that
90- using Variables defines a computational graph, allowing you to automatically
91- compute gradients.
92-
93- Here we use PyTorch Variables and autograd to implement our two-layer network;
87+ This sounds complicated, it's pretty simple to use in practice. If we want to
88+ compute gradients with respect to some Tensor, then we set ` requires_grad=True `
89+ when constructing that Tensor. Any PyTorch operations on that Tensor will cause
90+ a computational graph to be constructed, allowing us to later perform backpropagation
91+ through the graph. If ` x ` is a Tensor with ` requires_grad=True ` , then after
92+ backpropagation ` x.grad ` will be another Tensor holding the gradient of ` x ` with
93+ respect to some scalar value.
94+
95+ Sometimes you may wish to prevent PyTorch from building computational graphs when
96+ performing certain operations on Tensors with ` requires_grad=True ` ; for example
97+ we usually don't want to backpropagate through the weight update steps when
98+ training a neural network. In such scenarios we can use the ` torch.no_grad() `
99+ context manager to prevent the construction of a computational graph.
100+
101+ Here we use PyTorch Tensors and autograd to implement our two-layer network;
94102now we no longer need to manually implement the backward pass through the
95103network:
96104
@@ -108,7 +116,7 @@ with respect to that same scalar value.
108116In PyTorch we can easily define our own autograd operator by defining a subclass
109117of ` torch.autograd.Function ` and implementing the ` forward ` and ` backward ` functions.
110118We can then use our new autograd operator by constructing an instance and calling it
111- like a function, passing Variables containing input data.
119+ like a function, passing Tensors containing input data.
112120
113121In this example we define our own custom autograd function for performing the ReLU
114122nonlinearity, and use it to implement our two-layer network:
@@ -168,8 +176,8 @@ raw computational graphs that are useful for building neural networks.
168176
169177In PyTorch, the ` nn ` package serves this same purpose. The ` nn ` package defines a set of
170178** Modules** , which are roughly equivalent to neural network layers. A Module receives
171- input Variables and computes output Variables , but may also hold internal state such as
172- Variables containing learnable parameters. The ` nn ` package also defines a set of useful
179+ input Tensors and computes output Tensors , but may also hold internal state such as
180+ Tensors containing learnable parameters. The ` nn ` package also defines a set of useful
173181loss functions that are commonly used when training neural networks.
174182
175183In this example we use the ` nn ` package to implement our two-layer network:
@@ -180,8 +188,8 @@ In this example we use the `nn` package to implement our two-layer network:
180188
181189
182190## PyTorch: optim
183- Up to this point we have updated the weights of our models by manually mutating the
184- ` .data ` member for Variables holding learnable parameters. This is not a huge burden
191+ Up to this point we have updated the weights of our models by manually mutating
192+ Tensors holding learnable parameters. This is not a huge burden
185193for simple optimization algorithms like stochastic gradient descent, but in practice
186194we often train neural networks using more sophisiticated optimizers like AdaGrad,
187195RMSProp, Adam, etc.
@@ -200,8 +208,8 @@ will optimize the model using the Adam algorithm provided by the `optim` package
200208## PyTorch: Custom nn Modules
201209Sometimes you will want to specify models that are more complex than a sequence of
202210existing Modules; for these cases you can define your own Modules by subclassing
203- ` nn.Module ` and defining a ` forward ` which receives input Variables and produces
204- output Variables using other modules or other autograd operations on Variables .
211+ ` nn.Module ` and defining a ` forward ` which receives input Tensors and produces
212+ output Tensors using other modules or other autograd operations on Tensors .
205213
206214In this example we implement our two-layer network as a custom Module subclass:
207215
0 commit comments