Skip to content

Commit fadd7bb

Browse files
authored
Merge pull request jcjohnson#19 from jcjohnson/0.4
0.4
2 parents 0f1b88a + 997b3b7 commit fadd7bb

File tree

9 files changed

+320
-302
lines changed

9 files changed

+320
-302
lines changed

README.md

Lines changed: 156 additions & 147 deletions
Large diffs are not rendered by default.

README_raw.md

Lines changed: 36 additions & 28 deletions
Original file line numberDiff line numberDiff line change
@@ -11,10 +11,15 @@ will have a single hidden layer, and will be trained with gradient descent to
1111
fit random data by minimizing the Euclidean distance between the network output
1212
and the true output.
1313

14+
**NOTE:** These examples have been update for PyTorch 0.4, which made several
15+
major changes to the core PyTorch API. Most notably, prior to 0.4 Tensors had
16+
to be wrapped in Variable objects to use autograd; this functionality has now
17+
been added directly to Tensors, and Variables are now deprecated.
18+
1419
### Table of Contents
1520
- <a href='#warm-up-numpy'>Warm-up: numpy</a>
1621
- <a href='#pytorch-tensors'>PyTorch: Tensors</a>
17-
- <a href='#pytorch-variables-and-autograd'>PyTorch: Variables and autograd</a>
22+
- <a href='#pytorch-autograd'>PyTorch: Autograd</a>
1823
- <a href='#pytorch-defining-new-autograd-functions'>PyTorch: Defining new autograd functions</a>
1924
- <a href='#tensorflow-static-graphs'>TensorFlow: Static Graphs</a>
2025
- <a href='#pytorch-nn'>PyTorch: nn</a>
@@ -46,24 +51,24 @@ unfortunately numpy won't be enough for modern deep learning.
4651

4752
Here we introduce the most fundamental PyTorch concept: the **Tensor**. A PyTorch
4853
Tensor is conceptually identical to a numpy array: a Tensor is an n-dimensional
49-
array, and PyTorch provides many functions for operating on these Tensors. Like
50-
numpy arrays, PyTorch Tensors do not know anything about deep learning or
51-
computational graphs or gradients; they are a generic tool for scientific
54+
array, and PyTorch provides many functions for operating on these Tensors.
55+
Any computation you might want to perform with numpy can also be accomplished
56+
with PyTorch Tensors; you should think of them as a generic tool for scientific
5257
computing.
5358

5459
However unlike numpy, PyTorch Tensors can utilize GPUs to accelerate their
55-
numeric computations. To run a PyTorch Tensor on GPU, you simply need to cast it
56-
to a new datatype.
60+
numeric computations. To run a PyTorch Tensor on GPU, you use the `device`
61+
argument when constructing a Tensor to place the Tensor on a GPU.
5762

5863
Here we use PyTorch Tensors to fit a two-layer network to random data. Like the
59-
numpy example above we need to manually implement the forward and backward
60-
passes through the network:
64+
numpy example above we manually implement the forward and backward
65+
passes through the network, using operations on PyTorch Tensors:
6166

6267
```python
6368
:INCLUDE tensor/two_layer_net_tensor.py
6469
```
6570

66-
## PyTorch: Variables and autograd
71+
## PyTorch: Autograd
6772

6873
In the above examples, we had to manually implement both the forward and
6974
backward passes of our neural network. Manually implementing the backward pass
@@ -79,18 +84,21 @@ When using autograd, the forward pass of your network will define a
7984
functions that produce output Tensors from input Tensors. Backpropagating through
8085
this graph then allows you to easily compute gradients.
8186

82-
This sounds complicated, it's pretty simple to use in practice. We wrap our
83-
PyTorch Tensors in **Variable** objects; a Variable represents a node in a
84-
computational graph. If `x` is a Variable then `x.data` is a Tensor, and
85-
`x.grad` is another Variable holding the gradient of `x` with respect to some
86-
scalar value.
87-
88-
PyTorch Variables have the same API as PyTorch Tensors: (almost) any operation
89-
that you can perform on a Tensor also works on Variables; the difference is that
90-
using Variables defines a computational graph, allowing you to automatically
91-
compute gradients.
92-
93-
Here we use PyTorch Variables and autograd to implement our two-layer network;
87+
This sounds complicated, it's pretty simple to use in practice. If we want to
88+
compute gradients with respect to some Tensor, then we set `requires_grad=True`
89+
when constructing that Tensor. Any PyTorch operations on that Tensor will cause
90+
a computational graph to be constructed, allowing us to later perform backpropagation
91+
through the graph. If `x` is a Tensor with `requires_grad=True`, then after
92+
backpropagation `x.grad` will be another Tensor holding the gradient of `x` with
93+
respect to some scalar value.
94+
95+
Sometimes you may wish to prevent PyTorch from building computational graphs when
96+
performing certain operations on Tensors with `requires_grad=True`; for example
97+
we usually don't want to backpropagate through the weight update steps when
98+
training a neural network. In such scenarios we can use the `torch.no_grad()`
99+
context manager to prevent the construction of a computational graph.
100+
101+
Here we use PyTorch Tensors and autograd to implement our two-layer network;
94102
now we no longer need to manually implement the backward pass through the
95103
network:
96104

@@ -108,7 +116,7 @@ with respect to that same scalar value.
108116
In PyTorch we can easily define our own autograd operator by defining a subclass
109117
of `torch.autograd.Function` and implementing the `forward` and `backward` functions.
110118
We can then use our new autograd operator by constructing an instance and calling it
111-
like a function, passing Variables containing input data.
119+
like a function, passing Tensors containing input data.
112120

113121
In this example we define our own custom autograd function for performing the ReLU
114122
nonlinearity, and use it to implement our two-layer network:
@@ -168,8 +176,8 @@ raw computational graphs that are useful for building neural networks.
168176

169177
In PyTorch, the `nn` package serves this same purpose. The `nn` package defines a set of
170178
**Modules**, which are roughly equivalent to neural network layers. A Module receives
171-
input Variables and computes output Variables, but may also hold internal state such as
172-
Variables containing learnable parameters. The `nn` package also defines a set of useful
179+
input Tensors and computes output Tensors, but may also hold internal state such as
180+
Tensors containing learnable parameters. The `nn` package also defines a set of useful
173181
loss functions that are commonly used when training neural networks.
174182

175183
In this example we use the `nn` package to implement our two-layer network:
@@ -180,8 +188,8 @@ In this example we use the `nn` package to implement our two-layer network:
180188

181189

182190
## PyTorch: optim
183-
Up to this point we have updated the weights of our models by manually mutating the
184-
`.data` member for Variables holding learnable parameters. This is not a huge burden
191+
Up to this point we have updated the weights of our models by manually mutating
192+
Tensors holding learnable parameters. This is not a huge burden
185193
for simple optimization algorithms like stochastic gradient descent, but in practice
186194
we often train neural networks using more sophisiticated optimizers like AdaGrad,
187195
RMSProp, Adam, etc.
@@ -200,8 +208,8 @@ will optimize the model using the Adam algorithm provided by the `optim` package
200208
## PyTorch: Custom nn Modules
201209
Sometimes you will want to specify models that are more complex than a sequence of
202210
existing Modules; for these cases you can define your own Modules by subclassing
203-
`nn.Module` and defining a `forward` which receives input Variables and produces
204-
output Variables using other modules or other autograd operations on Variables.
211+
`nn.Module` and defining a `forward` which receives input Tensors and produces
212+
output Tensors using other modules or other autograd operations on Tensors.
205213

206214
In this example we implement our two-layer network as a custom Module subclass:
207215

autograd/two_layer_net_autograd.py

Lines changed: 37 additions & 40 deletions
Original file line numberDiff line numberDiff line change
@@ -1,68 +1,65 @@
11
import torch
2-
from torch.autograd import Variable
32

43
"""
54
A fully-connected ReLU network with one hidden layer and no biases, trained to
65
predict y from x by minimizing squared Euclidean distance.
76
87
This implementation computes the forward pass using operations on PyTorch
9-
Variables, and uses PyTorch autograd to compute gradients.
8+
Tensors, and uses PyTorch autograd to compute gradients.
109
11-
A PyTorch Variable is a wrapper around a PyTorch Tensor, and represents a node
12-
in a computational graph. If x is a Variable then x.data is a Tensor giving its
13-
value, and x.grad is another Variable holding the gradient of x with respect to
14-
some scalar value.
15-
16-
PyTorch Variables have the same API as PyTorch tensors: (almost) any operation
17-
you can do on a Tensor you can also do on a Variable; the difference is that
18-
autograd allows you to automatically compute gradients.
10+
When we create a PyTorch Tensor with requires_grad=True, then operations
11+
involving that Tensor will not just compute values; they will also build up
12+
a computational graph in the background, allowing us to easily backpropagate
13+
through the graph to compute gradients of some Tensors with respect to a
14+
downstream loss. Concretely if x is a Tensor with x.requires_grad == True then
15+
after backpropagation x.grad will be another Tensor holding the gradient of x
16+
with respect to some scalar value.
1917
"""
2018

21-
dtype = torch.FloatTensor
22-
# dtype = torch.cuda.FloatTensor # Uncomment this to run on GPU
19+
device = torch.device('cpu')
20+
# device = torch.device('cuda') # Uncomment this to run on GPU
2321

2422
# N is batch size; D_in is input dimension;
2523
# H is hidden dimension; D_out is output dimension.
2624
N, D_in, H, D_out = 64, 1000, 100, 10
2725

28-
# Create random Tensors to hold input and outputs, and wrap them in Variables.
29-
# Setting requires_grad=False indicates that we do not need to compute gradients
30-
# with respect to these Variables during the backward pass.
31-
x = Variable(torch.randn(N, D_in).type(dtype), requires_grad=False)
32-
y = Variable(torch.randn(N, D_out).type(dtype), requires_grad=False)
26+
# Create random Tensors to hold input and outputs
27+
x = torch.randn(N, D_in, device=device)
28+
y = torch.randn(N, D_out, device=device)
3329

34-
# Create random Tensors for weights, and wrap them in Variables.
35-
# Setting requires_grad=True indicates that we want to compute gradients with
36-
# respect to these Variables during the backward pass.
37-
w1 = Variable(torch.randn(D_in, H).type(dtype), requires_grad=True)
38-
w2 = Variable(torch.randn(H, D_out).type(dtype), requires_grad=True)
30+
# Create random Tensors for weights; setting requires_grad=True means that we
31+
# want to compute gradients for these Tensors during the backward pass.
32+
w1 = torch.randn(D_in, H, device=device, requires_grad=True)
33+
w2 = torch.randn(H, D_out, device=device, requires_grad=True)
3934

4035
learning_rate = 1e-6
4136
for t in range(500):
42-
# Forward pass: compute predicted y using operations on Variables; these
43-
# are exactly the same operations we used to compute the forward pass using
44-
# Tensors, but we do not need to keep references to intermediate values since
45-
# we are not implementing the backward pass by hand.
37+
# Forward pass: compute predicted y using operations on Tensors. Since w1 and
38+
# w2 have requires_grad=True, operations involving these Tensors will cause
39+
# PyTorch to build a computational graph, allowing automatic computation of
40+
# gradients. Since we are no longer implementing the backward pass by hand we
41+
# don't need to keep references to intermediate values.
4642
y_pred = x.mm(w1).clamp(min=0).mm(w2)
4743

48-
# Compute and print loss using operations on Variables.
49-
# Now loss is a Variable of shape (1,) and loss.data is a Tensor of shape
50-
# (1,); loss.data[0] is a scalar value holding the loss.
44+
# Compute and print loss. Loss is a Tensor of shape (), and loss.item()
45+
# is a Python number giving its value.
5146
loss = (y_pred - y).pow(2).sum()
52-
print(t, loss.data[0])
47+
print(t, loss.item())
5348

5449
# Use autograd to compute the backward pass. This call will compute the
55-
# gradient of loss with respect to all Variables with requires_grad=True.
56-
# After this call w1.grad and w2.grad will be Variables holding the gradient
50+
# gradient of loss with respect to all Tensors with requires_grad=True.
51+
# After this call w1.grad and w2.grad will be Tensors holding the gradient
5752
# of the loss with respect to w1 and w2 respectively.
5853
loss.backward()
5954

60-
# Update weights using gradient descent; w1.data and w2.data are Tensors,
61-
# w1.grad and w2.grad are Variables and w1.grad.data and w2.grad.data are
62-
# Tensors.
63-
w1.data -= learning_rate * w1.grad.data
64-
w2.data -= learning_rate * w2.grad.data
55+
# Update weights using gradient descent. For this step we just want to mutate
56+
# the values of w1 and w2 in-place; we don't want to build up a computational
57+
# graph for the update steps, so we use the torch.no_grad() context manager
58+
# to prevent PyTorch from building a computational graph for the updates
59+
with torch.no_grad():
60+
w1 -= learning_rate * w1.grad
61+
w2 -= learning_rate * w2.grad
6562

66-
# Manually zero the gradients after running the backward pass
67-
w1.grad.data.zero_()
68-
w2.grad.data.zero_()
63+
# Manually zero the gradients after running the backward pass
64+
w1.grad.zero_()
65+
w2.grad.zero_()

autograd/two_layer_net_custom_function.py

Lines changed: 39 additions & 38 deletions
Original file line numberDiff line numberDiff line change
@@ -1,12 +1,11 @@
11
import torch
2-
from torch.autograd import Variable
32

43
"""
54
A fully-connected ReLU network with one hidden layer and no biases, trained to
65
predict y from x by minimizing squared Euclidean distance.
76
87
This implementation computes the forward pass using operations on PyTorch
9-
Variables, and uses PyTorch autograd to compute gradients.
8+
Tensors, and uses PyTorch autograd to compute gradients.
109
1110
In this implementation we implement our own custom autograd function to perform
1211
the ReLU function.
@@ -18,62 +17,64 @@ class MyReLU(torch.autograd.Function):
1817
torch.autograd.Function and implementing the forward and backward passes
1918
which operate on Tensors.
2019
"""
21-
def forward(self, input):
20+
@staticmethod
21+
def forward(ctx, x):
2222
"""
23-
In the forward pass we receive a Tensor containing the input and return a
24-
Tensor containing the output. You can cache arbitrary Tensors for use in the
25-
backward pass using the save_for_backward method.
23+
In the forward pass we receive a context object and a Tensor containing the
24+
input; we must return a Tensor containing the output, and we can use the
25+
context object to cache objects for use in the backward pass.
2626
"""
27-
self.save_for_backward(input)
28-
return input.clamp(min=0)
27+
ctx.save_for_backward(x)
28+
return x.clamp(min=0)
2929

30-
def backward(self, grad_output):
30+
def backward(ctx, grad_output):
3131
"""
32-
In the backward pass we receive a Tensor containing the gradient of the loss
33-
with respect to the output, and we need to compute the gradient of the loss
34-
with respect to the input.
32+
In the backward pass we receive the context object and a Tensor containing
33+
the gradient of the loss with respect to the output produced during the
34+
forward pass. We can retrieve cached data from the context object, and must
35+
compute and return the gradient of the loss with respect to the input to the
36+
forward function.
3537
"""
36-
input, = self.saved_tensors
37-
grad_input = grad_output.clone()
38-
grad_input[input < 0] = 0
39-
return grad_input
38+
x, = ctx.saved_tensors
39+
grad_x = grad_output.clone()
40+
grad_x[x < 0] = 0
41+
return grad_x
4042

4143

42-
dtype = torch.FloatTensor
43-
# dtype = torch.cuda.FloatTensor # Uncomment this to run on GPU
44+
device = torch.device('cpu')
45+
# device = torch.device('cuda') # Uncomment this to run on GPU
4446

4547
# N is batch size; D_in is input dimension;
4648
# H is hidden dimension; D_out is output dimension.
4749
N, D_in, H, D_out = 64, 1000, 100, 10
4850

49-
# Create random Tensors to hold input and outputs, and wrap them in Variables.
50-
x = Variable(torch.randn(N, D_in).type(dtype), requires_grad=False)
51-
y = Variable(torch.randn(N, D_out).type(dtype), requires_grad=False)
51+
# Create random Tensors to hold input and output
52+
x = torch.randn(N, D_in, device=device)
53+
y = torch.randn(N, D_out, device=device)
5254

53-
# Create random Tensors for weights, and wrap them in Variables.
54-
w1 = Variable(torch.randn(D_in, H).type(dtype), requires_grad=True)
55-
w2 = Variable(torch.randn(H, D_out).type(dtype), requires_grad=True)
55+
# Create random Tensors for weights.
56+
w1 = torch.randn(D_in, H, device=device, requires_grad=True)
57+
w2 = torch.randn(H, D_out, device=device, requires_grad=True)
5658

5759
learning_rate = 1e-6
5860
for t in range(500):
59-
# Construct an instance of our MyReLU class to use in our network
60-
relu = MyReLU()
61-
62-
# Forward pass: compute predicted y using operations on Variables; we compute
63-
# ReLU using our custom autograd operation.
64-
y_pred = relu(x.mm(w1)).mm(w2)
65-
61+
# Forward pass: compute predicted y using operations on Tensors; we call our
62+
# custom ReLU implementation using the MyReLU.apply function
63+
y_pred = MyReLU.apply(x.mm(w1)).mm(w2)
64+
6665
# Compute and print loss
6766
loss = (y_pred - y).pow(2).sum()
68-
print(t, loss.data[0])
67+
print(t, loss.item())
6968

7069
# Use autograd to compute the backward pass.
7170
loss.backward()
7271

73-
# Update weights using gradient descent
74-
w1.data -= learning_rate * w1.grad.data
75-
w2.data -= learning_rate * w2.grad.data
72+
with torch.no_grad():
73+
# Update weights using gradient descent
74+
w1 -= learning_rate * w1.grad
75+
w2 -= learning_rate * w2.grad
76+
77+
# Manually zero the gradients after running the backward pass
78+
w1.grad.zero_()
79+
w2.grad.zero_()
7680

77-
# Manually zero the gradients after running the backward pass
78-
w1.grad.data.zero_()
79-
w2.grad.data.zero_()

nn/dynamic_net.py

Lines changed: 4 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,5 @@
11
import random
22
import torch
3-
from torch.autograd import Variable
43

54
"""
65
To showcase the power of PyTorch dynamic graphs, we will implement a very strange
@@ -45,9 +44,9 @@ def forward(self, x):
4544
# H is hidden dimension; D_out is output dimension.
4645
N, D_in, H, D_out = 64, 1000, 100, 10
4746

48-
# Create random Tensors to hold inputs and outputs, and wrap them in Variables
49-
x = Variable(torch.randn(N, D_in))
50-
y = Variable(torch.randn(N, D_out), requires_grad=False)
47+
# Create random Tensors to hold inputs and outputs.
48+
x = torch.randn(N, D_in)
49+
y = torch.randn(N, D_out)
5150

5251
# Construct our model by instantiating the class defined above
5352
model = DynamicNet(D_in, H, D_out)
@@ -62,7 +61,7 @@ def forward(self, x):
6261

6362
# Compute and print loss
6463
loss = criterion(y_pred, y)
65-
print(t, loss.data[0])
64+
print(t, loss.item())
6665

6766
# Zero gradients, perform a backward pass, and update the weights.
6867
optimizer.zero_grad()

0 commit comments

Comments
 (0)