diff --git a/13_Understanding_Hooks.ipynb b/13_Understanding_Hooks.ipynb new file mode 100644 index 0000000..059c391 --- /dev/null +++ b/13_Understanding_Hooks.ipynb @@ -0,0 +1,539 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "id": "d23d46ab", + "metadata": {}, + "source": [ + "Hooks in PyTorch are under-documented but provide powerful functionality during backpropagation. They can be compared to Doctor Fate, a lesser-known superhero.\n", + "\n", + "One of the reasons hooks are valuable is that they allow actions during backpropagation. A hook can be registered on a Tensor or a nn.Module and is essentially a function executed when either the forward or backward pass is called.\n", + "\n", + "When referring to \"forward,\" it doesn't mean the forward function of a nn.Module where the output is computed. Instead, it means the forward function of the `torch.Autograd.Function` object that is the `grad_fn` of a Tensor.\n", + "\n", + "PyTorch provides two types of hooks:\n", + "- **Forward Hook**\n", + "- **Backward Hook**\n", + "\n", + "A forward hook is executed during the forward pass, while the backward hook is executed when the backward function is called.\n", + "\n", + "#### Hooks for Tensors\n", + "A hook is essentially a function with a specific signature. For tensors, the signature for the backward hook is:" + ] + }, + { + "cell_type": "code", + "execution_count": 1, + "id": "088f6de7", + "metadata": {}, + "outputs": [], + "source": [ + "# hook(grad) -> Tensor or None" + ] + }, + { + "cell_type": "markdown", + "id": "f8fb08cb", + "metadata": {}, + "source": [ + "There is no forward hook for a tensor.\n", + "\n", + "`grad` represents the value contained in the grad attribute of the tensor after backward is called. The function should not modify its argument. It must either return None or a Tensor to be used in place of grad for further gradient computation. Here's an example:" + ] + }, + { + "cell_type": "code", + "execution_count": 4, + "id": "0ea56adf", + "metadata": {}, + "outputs": [], + "source": [ + "import torch" + ] + }, + { + "cell_type": "code", + "execution_count": 7, + "id": "ec348da8", + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "" + ] + }, + "execution_count": 7, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "def backward_hook(grad):\n", + " # Modify the gradient or perform some action\n", + " modified_grad = grad * 2\n", + " return modified_grad\n", + "\n", + "tensor = torch.tensor(1., requires_grad=True)\n", + "tensor.register_hook(backward_hook)" + ] + }, + { + "cell_type": "markdown", + "id": "ace0ef1f", + "metadata": {}, + "source": [ + "In the first experiment, we calculate the gradients of tensors `a` and `b` with respect to the scalar `c`. The gradient of `a` is computed using autograd. Then, we print the gradients of `a` and `b`. In the second experiment, we redo the computation of gradients for `a` and `b`, but this time we register a hook on tensor `b` that multiplies its gradient by 2. Here's the code:" + ] + }, + { + "cell_type": "code", + "execution_count": 9, + "id": "ab85cafd", + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Experiment 1:\n", + "Gradient of a: tensor([0.4000, 0.4000, 0.4000, 0.4000, 0.4000])\n", + "Gradient of b: tensor([0.2000, 0.2000, 0.2000, 0.2000, 0.2000])\n", + "Experiment 2:\n", + "Gradient of a: tensor([0.8000, 0.8000, 0.8000, 0.8000, 0.8000])\n", + "Gradient of b: tensor([0.2000, 0.2000, 0.2000, 0.2000, 0.2000])\n" + ] + } + ], + "source": [ + "import torch \n", + "\n", + "# Experiment 1\n", + "a = torch.ones(5)\n", + "a.requires_grad = True\n", + "\n", + "b = 2*a\n", + "b.retain_grad() # Since b is non-leaf and its grad will be destroyed otherwise.\n", + "\n", + "c = b.mean()\n", + "\n", + "c.backward()\n", + "\n", + "print(\"Experiment 1:\")\n", + "print(\"Gradient of a:\", a.grad)\n", + "print(\"Gradient of b:\", b.grad)\n", + "\n", + "# Experiment 2\n", + "a = torch.ones(5)\n", + "a.requires_grad = True\n", + "\n", + "b = 2*a\n", + "b.retain_grad()\n", + "\n", + "# Register a hook that multiplies b's grad by 2\n", + "b.register_hook(lambda grad: grad * 2) \n", + "\n", + "b.mean().backward() \n", + "\n", + "print(\"Experiment 2:\")\n", + "print(\"Gradient of a:\", a.grad)\n", + "print(\"Gradient of b:\", b.grad)" + ] + }, + { + "cell_type": "markdown", + "id": "56733a1c", + "metadata": {}, + "source": [ + "Hooks provide versatile functionality during the backward pass. Here are some key uses:\n", + "\n", + "1. **Debugging and Logging**: Hooks allow you to print or log gradient values for debugging purposes. This is particularly useful for non-leaf variables whose gradients may not be retained unless you explicitly call `retain_grad()` on them. Using hooks provides a cleaner way to aggregate and inspect these values.\n", + "\n", + "2. **Gradient Modification**: Hooks enable you to modify gradients during the backward pass. This is crucial for various applications. For example, in the previous example where we multiplied b's gradient by 2, subsequent gradient calculations (such as those of a or any tensor that depends on b for gradient) now use the modified gradient (`2 * grad(b)`) instead of the original gradient. If you were to update parameters individually after the backward pass, you'd have to manually multiply each gradient by 2, which could be cumbersome and error-prone.\n", + "\n", + "Using hooks streamlines the process of debugging, logging, and modifying gradients during backpropagation, enhancing the flexibility and efficiency of your PyTorch code.\n", + "\n", + "\n", + "#### Hooks for nn.Module\n", + "Hooks for `nn.Module` objects provide additional flexibility during the forward and backward passes of the neural network. However, their usage can sometimes lead to breaking the abstraction of `nn.Module`. Here's an explanation of the hook function signatures:\n", + "\n", + "**Backward Hook:**" + ] + }, + { + "cell_type": "code", + "execution_count": 10, + "id": "67bac3e6", + "metadata": {}, + "outputs": [], + "source": [ + "# hook(module, grad_input, grad_output) -> Tensor or None" + ] + }, + { + "cell_type": "markdown", + "id": "1efe8f7f", + "metadata": {}, + "source": [ + "This hook function is executed during the backward pass. The module parameter represents the nn.Module object to which the hook is attached. `grad_input` is a tuple containing the gradients of the inputs of the `nn.Module` object with respect to the loss (e.g., `dL / dx`, `dL / dw`, `dL / db`). `grad_output` is the gradient of the output of the `nn.Module` object with respect to the loss. However, due to the possibility of multiple forward calls within an `nn.Module`, the interpretation of these gradients can be ambiguous.\n", + "\n", + "**Forward Hook:**" + ] + }, + { + "cell_type": "code", + "execution_count": 11, + "id": "58b5a78b", + "metadata": {}, + "outputs": [], + "source": [ + "# hook(module, input, output) -> None" + ] + }, + { + "cell_type": "markdown", + "id": "be7faca5", + "metadata": {}, + "source": [ + "This hook function is executed during the forward pass. Similar to the backward hook, the module parameter represents the `nn.Module` object. input is a tuple containing the inputs to the `nn.Module` object from different forward calls, and output is the output of the forward call.\n", + "\n", + "Using hooks on `nn.Module` objects can be powerful but may break the abstraction of `nn.Module`. Since an `nn.Module` is meant to represent a modularized layer, hooks introduce an arbitrary number of forward and backward calls, potentially complicating the interpretation of gradients. Therefore, careful consideration should be given before using hooks on `nn.Module` objects." + ] + }, + { + "cell_type": "code", + "execution_count": 12, + "id": "ca9a8815", + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Linear(in_features=160, out_features=5, bias=True)\n", + "------------Input Grad------------\n", + "torch.Size([5])\n", + "torch.Size([5])\n", + "------------Output Grad------------\n", + "torch.Size([5])\n", + "\n", + "\n", + "Conv2d(3, 10, kernel_size=(2, 2), stride=(2, 2))\n", + "------------Input Grad------------\n", + "None found for Gradient\n", + "torch.Size([10, 3, 2, 2])\n", + "torch.Size([10])\n", + "------------Output Grad------------\n", + "torch.Size([1, 10, 4, 4])\n", + "\n", + "\n" + ] + }, + { + "name": "stderr", + "output_type": "stream", + "text": [ + "C:\\Users\\Hp\\.conda\\envs\\pytorch\\lib\\site-packages\\torch\\nn\\modules\\module.py:1117: UserWarning: Using a non-full backward hook when the forward contains multiple autograd Nodes is deprecated and will be removed in future versions. This hook will be missing some grad_input. Please use register_full_backward_hook to get the documented behavior.\n", + " warnings.warn(\"Using a non-full backward hook when the forward contains multiple autograd Nodes \"\n" + ] + } + ], + "source": [ + "import torch \n", + "import torch.nn as nn\n", + "\n", + "class myNet(nn.Module):\n", + " def __init__(self):\n", + " super().__init__()\n", + " self.conv = nn.Conv2d(3, 10, 2, stride=2)\n", + " self.relu = nn.ReLU()\n", + " self.flatten = lambda x: x.view(-1)\n", + " self.fc1 = nn.Linear(160, 5)\n", + "\n", + " def forward(self, x):\n", + " x = self.relu(self.conv(x))\n", + " return self.fc1(self.flatten(x))\n", + "\n", + "net = myNet()\n", + "\n", + "def hook_fn(m, i, o):\n", + " print(m)\n", + " print(\"------------Input Grad------------\")\n", + " for grad in i:\n", + " try:\n", + " print(grad.shape)\n", + " except AttributeError: \n", + " print(\"None found for Gradient\")\n", + " print(\"------------Output Grad------------\")\n", + " for grad in o: \n", + " try:\n", + " print(grad.shape)\n", + " except AttributeError: \n", + " print(\"None found for Gradient\")\n", + " print(\"\\n\")\n", + "\n", + "net.conv.register_backward_hook(hook_fn)\n", + "net.fc1.register_backward_hook(hook_fn)\n", + "\n", + "inp = torch.randn(1, 3, 8, 8)\n", + "out = net(inp)\n", + "\n", + "(1 - out.mean()).backward()" + ] + }, + { + "cell_type": "markdown", + "id": "b812deb8", + "metadata": {}, + "source": [ + "In this code snippet, we define a simple neural network `myNet` consisting of a convolutional layer, a ReLU activation function, and a fully connected layer. We then define a hook function `hook_fn` that prints information about the input and output gradients of specific layers during the backward pass. Finally, we register the hook function to the convolutional and fully connected layers of the network and perform a forward pass followed by a backward pass.\n", + "\n", + "Here's a breakdown of what the code does:\n", + "\n", + "- We define a neural network class `myNet` that inherits from `nn.Module`.\n", + "- In the **__init__** method of myNet, we define the layers of the network: a convolutional layer, a ReLU activation function, and a fully connected layer.\n", + "- We define the forward method of `myNet`, which specifies the forward pass of the network.\n", + "- We define a hook function `hook_fn` that takes three arguments: `m` (the module to which the hook is attached), `i` (the input gradients), and `o` (the output gradients).\n", + "- Inside the hook function, we print the module, input gradients, and output gradients.\n", + "- We register the hook function to the convolutional and fully connected layers of the network using the `register_backward_hook` method.\n", + "- We create an input tensor `inp` with random values and perform a forward pass through the network.\n", + "- We compute the loss as `1 - out.mean()` and perform a backward pass to compute gradients.\n", + "\n", + "When the backward pass is performed, the hook function `hook_fn` is called for each layer to which it is registered. The hook function prints information about the module (layer), input gradients, and output gradients.\n", + "\n", + "This allows us to inspect the gradients flowing through specific layers of the network during the backward pass, which can be helpful for debugging and understanding the behavior of the network during training." + ] + }, + { + "cell_type": "markdown", + "id": "18d17e64", + "metadata": {}, + "source": [ + "#### Proper Way of Using Hooks" + ] + }, + { + "cell_type": "code", + "execution_count": 13, + "id": "8fc8a70b", + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Gradients less than zero: False\n", + "The biases are tensor([0., 0., 0., 0., 0.])\n" + ] + } + ], + "source": [ + "import torch \n", + "import torch.nn as nn\n", + "\n", + "class myNet(nn.Module):\n", + " def __init__(self):\n", + " super().__init__()\n", + " self.conv = nn.Conv2d(3, 10, 2, stride=2)\n", + " self.relu = nn.ReLU()\n", + " self.flatten = lambda x: x.view(-1)\n", + " self.fc1 = nn.Linear(160, 5)\n", + " \n", + " def forward(self, x):\n", + " x = self.relu(self.conv(x))\n", + " \n", + " # Register a hook to clamp gradients of conv layer to be non-negative\n", + " x.register_hook(lambda grad: torch.clamp(grad, min=0))\n", + " \n", + " # Register a hook to print whether there are any negative gradients\n", + " x.register_hook(lambda grad: print(\"Gradients less than zero:\", bool((grad < 0).any())))\n", + " \n", + " return self.fc1(self.flatten(x))\n", + "\n", + "net = myNet()\n", + "\n", + "# Apply hooks to biases of linear layers to zero out their gradients\n", + "for name, param in net.named_parameters():\n", + " if \"fc\" in name and \"bias\" in name:\n", + " param.register_hook(lambda grad: torch.zeros(grad.shape))\n", + "\n", + "# Perform a forward pass\n", + "out = net(torch.randn(1, 3, 8, 8)) \n", + "\n", + "# Compute gradients and perform backward pass\n", + "(1 - out).mean().backward()\n", + "\n", + "# Print the gradients of the biases\n", + "print(\"The biases are\", net.fc1.bias.grad) # Bias gradients should be zero" + ] + }, + { + "cell_type": "markdown", + "id": "7925ccf7", + "metadata": {}, + "source": [ + "- **Clamping Convolutional Layer Gradients**: The hook registered with `x.register_hook(lambda grad: torch.clamp(grad, min=0))` ensures that gradients flowing through the convolutional layer (conv) are clamped to be non-negative. This ensures that no negative gradient is backpropagated through this layer.\n", + "\n", + "- **Checking Negative Gradients**: The hook registered with `x.register_hook(lambda grad: print(\"Gradients less than zero:\", bool((grad < 0).any())))` prints a message indicating whether there are any negative gradients flowing through the convolutional layer. This is helpful for monitoring and debugging purposes.\n", + "\n", + "- **Zeroing Bias Gradients**: The loop over named parameters applies hooks to the biases of linear layers (fc), ensuring that their gradients are zeroed out during the backward pass." + ] + }, + { + "cell_type": "markdown", + "id": "6acf61f5", + "metadata": {}, + "source": [ + "#### Visualizing Activations\n", + "\n", + "Using forward hooks for visualizing activations is a common technique in deep learning. It allows you to capture intermediate feature maps during the forward pass for visualization and analysis. Here's how you can implement it:" + ] + }, + { + "cell_type": "code", + "execution_count": 14, + "id": "e7b251bc", + "metadata": {}, + "outputs": [], + "source": [ + "import torch \n", + "import torch.nn as nn\n", + "\n", + "class myNet(nn.Module):\n", + " def __init__(self):\n", + " super().__init__()\n", + " self.conv = nn.Conv2d(3, 10, 2, stride=2)\n", + " self.relu = nn.ReLU()\n", + " self.flatten = lambda x: x.view(-1)\n", + " self.fc1 = nn.Linear(160, 5)\n", + " \n", + " def forward(self, x):\n", + " x = self.relu(self.conv(x))\n", + " return self.fc1(self.flatten(x))\n", + "\n", + "# Define a dictionary to store intermediate feature maps\n", + "visualisation = {}\n", + "\n", + "# Define input tensor\n", + "inp = torch.randn(1, 3, 8, 8)\n", + "\n", + "# Define hook function to save intermediate feature maps\n", + "def hook_fn(m, i, o):\n", + " visualisation[m] = o\n", + "\n", + "# Create an instance of the network\n", + "net = myNet()\n", + "\n", + "# Register forward hooks for each layer in the network\n", + "for name, layer in net._modules.items():\n", + " layer.register_forward_hook(hook_fn)\n", + "\n", + "# Perform a forward pass through the network\n", + "out = net(inp)\n", + "\n", + "# Intermediate feature maps are now stored in the visualisation dictionary" + ] + }, + { + "cell_type": "markdown", + "id": "0e488d36", + "metadata": {}, + "source": [ + "By modifying the forward method of the nn.Module subclass, we can append the intermediate outputs to a dictionary or list, allowing you to access them after the forward pass. Additionally, our method handles cases where a model contains sequential layers, ensuring that the hooks are appropriately registered to the individual layers within the sequential block.\n", + "\n", + "Here's the adjusted code incorporating the approach:" + ] + }, + { + "cell_type": "code", + "execution_count": 15, + "id": "f67c84fe", + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "dict_keys(['fc1_output', 'seq_output'])\n" + ] + } + ], + "source": [ + "import torch \n", + "import torch.nn as nn\n", + "\n", + "class myNet(nn.Module):\n", + " def __init__(self):\n", + " super().__init__()\n", + " self.conv = nn.Conv2d(3, 10, 2, stride=2)\n", + " self.relu = nn.ReLU()\n", + " self.flatten = lambda x: x.view(-1)\n", + " self.fc1 = nn.Linear(160, 5)\n", + " self.seq = nn.Sequential(nn.Linear(5, 3), nn.Linear(3, 2))\n", + " \n", + " # Dictionary to store intermediate feature maps\n", + " self.visualisation = {}\n", + "\n", + " def forward(self, x):\n", + " x = self.relu(self.conv(x))\n", + " x = self.fc1(self.flatten(x))\n", + " \n", + " # Append intermediate output of fc1 layer to visualisation dictionary\n", + " self.visualisation['fc1_output'] = x\n", + " \n", + " # Apply sequential layers\n", + " x = self.seq(x)\n", + " \n", + " # Append intermediate output of sequential layers to visualisation dictionary\n", + " self.visualisation['seq_output'] = x\n", + " \n", + " return x\n", + "\n", + "net = myNet()\n", + "\n", + "# Perform a forward pass through the network\n", + "out = net(torch.randn(1, 3, 8, 8))\n", + "\n", + "# Check the keys of the visualisation dictionary to verify we captured all layers\n", + "print(net.visualisation.keys())" + ] + }, + { + "cell_type": "markdown", + "id": "b56615db", + "metadata": {}, + "source": [ + "Finally, you can turn this tensors into numpy arrays and plot activations ." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "2ec01eb7", + "metadata": {}, + "outputs": [], + "source": [] + } + ], + "metadata": { + "kernelspec": { + "display_name": "Python 3 (ipykernel)", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.8.16" + } + }, + "nbformat": 4, + "nbformat_minor": 5 +} diff --git a/README.md b/README.md index d7cc0bb..cd2d48f 100644 --- a/README.md +++ b/README.md @@ -1,24 +1,8 @@ # PyTorch Fundamentals -
- -[![GitHub stars](https://img.shields.io/github/stars/ml-dev-world/pytorch-fundamentals.svg?style=social)](https://github.com/ml-dev-world/pytorch-fundamentals/stargazers) -    -[![GitHub forks](https://img.shields.io/github/forks/ml-dev-world/pytorch-fundamentals.svg?style=social)](https://github.com/ml-dev-world/pytorch-fundamentals/network/members) -    -[![GitHub pull requests](https://img.shields.io/github/issues-pr/ml-dev-world/pytorch-fundamentals.svg)](https://github.com/ml-dev-world/pytorch-fundamentals/pulls) -    -[![GitHub issues](https://img.shields.io/github/issues/ml-dev-world/pytorch-fundamentals.svg)](https://github.com/ml-dev-world/pytorch-fundamentals/issues) -    -[![GitHub contributors](https://img.shields.io/github/contributors/ml-dev-world/pytorch-fundamentals.svg)](https://github.com/ml-dev-world/pytorch-fundamentals/graphs/contributors) - -
- - - ![Deep Learning](https://images.ctfassets.net/rc8q7tcpu9y3/4N9rb37CEIfSE6MQCh7tLx/393b411ead140cdf9d9255dee2aa5a97/Facebook-PyTorch-Conference-Experience-Design-Social.jpg?w=1200&h=630&fit=fill&fm=jpg&q=90) -Welcome to PyTorch Fundamentals! This repository is designed to help you master PyTorch from beginner to advanced levels with practical examples and tutorials. Whether you're just starting your deep learning journey or looking to enhance your skills, you'll find concise explanations and hands-on exercises to guide you through the world of PyTorch. +Welcome to the PyTorch Fundamentals! This repository is designed to help you master PyTorch from beginner to advanced levels with practical examples and tutorials. Whether you're just starting your deep learning journey or looking to enhance your skills, you'll find concise explanations and hands-on exercises to guide you through the world of PyTorch. ## Prerequisites