@@ -46,24 +46,24 @@ unfortunately numpy won't be enough for modern deep learning.
4646
4747Here we introduce the most fundamental PyTorch concept: the ** Tensor** . A PyTorch
4848Tensor is conceptually identical to a numpy array: a Tensor is an n-dimensional
49- array, and PyTorch provides many functions for operating on these Tensors. Like
50- numpy arrays, PyTorch Tensors do not know anything about deep learning or
51- computational graphs or gradients; they are a generic tool for scientific
49+ array, and PyTorch provides many functions for operating on these Tensors.
50+ Any computation you might want to perform with numpy can also be accomplished
51+ with PyTorch Tensors; you should think of them as a generic tool for scientific
5252computing.
5353
5454However unlike numpy, PyTorch Tensors can utilize GPUs to accelerate their
55- numeric computations. To run a PyTorch Tensor on GPU, you simply need to cast it
56- to a new datatype .
55+ numeric computations. To run a PyTorch Tensor on GPU, you use the ` device `
56+ argument when constructing a Tensor to place the Tensor on a GPU .
5757
5858Here we use PyTorch Tensors to fit a two-layer network to random data. Like the
59- numpy example above we need to manually implement the forward and backward
60- passes through the network:
59+ numpy example above we manually implement the forward and backward
60+ passes through the network, using operations on PyTorch Tensors :
6161
6262``` python
6363:INCLUDE tensor/ two_layer_net_tensor.py
6464```
6565
66- ## PyTorch: Variables and autograd
66+ ## PyTorch: Autograd
6767
6868In the above examples, we had to manually implement both the forward and
6969backward passes of our neural network. Manually implementing the backward pass
@@ -79,16 +79,19 @@ When using autograd, the forward pass of your network will define a
7979functions that produce output Tensors from input Tensors. Backpropagating through
8080this graph then allows you to easily compute gradients.
8181
82- This sounds complicated, it's pretty simple to use in practice. We wrap our
83- PyTorch Tensors in ** Variable** objects; a Variable represents a node in a
84- computational graph. If ` x ` is a Variable then ` x.data ` is a Tensor, and
85- ` x.grad ` is another Variable holding the gradient of ` x ` with respect to some
86- scalar value.
82+ This sounds complicated, it's pretty simple to use in practice. If we want to
83+ compute gradients with respect to some Tensor, then we set ` requires_grad=True `
84+ when constructing that Tensor. Any PyTorch operations on that Tensor will cause
85+ a computational graph to be constructed, allowing us to later perform backpropagation
86+ through the graph. If ` x ` is a Tensor with ` requires_grad=True ` , then after
87+ backpropagation ` x.grad ` will be another Tensor holding the gradient of ` x ` with
88+ respect to some scalar value.
8789
88- PyTorch Variables have the same API as PyTorch Tensors: (almost) any operation
89- that you can perform on a Tensor also works on Variables; the difference is that
90- using Variables defines a computational graph, allowing you to automatically
91- compute gradients.
90+ Sometimes you may wish to prevent PyTorch from building computational graphs when
91+ performing certain operations on Tensors with ` requires_grad=True ` ; for example
92+ we usually don't want to backpropagate through the weight update steps when
93+ training a neural network. In such scenarios we can use the ` torch.no_grad() `
94+ context manager to prevent the construction of a computational graph.
9295
9396Here we use PyTorch Variables and autograd to implement our two-layer network;
9497now we no longer need to manually implement the backward pass through the
@@ -180,8 +183,8 @@ In this example we use the `nn` package to implement our two-layer network:
180183
181184
182185## PyTorch: optim
183- Up to this point we have updated the weights of our models by manually mutating the
184- ` .data ` member for Variables holding learnable parameters. This is not a huge burden
186+ Up to this point we have updated the weights of our models by manually mutating
187+ Tensors holding learnable parameters. This is not a huge burden
185188for simple optimization algorithms like stochastic gradient descent, but in practice
186189we often train neural networks using more sophisiticated optimizers like AdaGrad,
187190RMSProp, Adam, etc.
0 commit comments